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DIAGNOSIS, PROGNOSIS AND IDENTIFICATION OF POTENTIAL 
THERAPEUTIC TARGETS OF MULTIPLE MYELOMA BASED ON GENE 
EXPRESSION PROFILING 

10 

BACKGROUND OF THE INVENTION 

Field of the Invention 

1 5 The present invention relates generally to the field of cancer research. 

More specifically, the present invention relates to gene expression profiling of plasma 
cells from normal individual and patients with multiple myeloma and monoclonal 
gammopathy of undetermined significance. 

20 Description of the Related Art 

Multiple myeloma (MM) is a uniformly fatal tumor of terminally 
differentiated plasma cells (PCs) that home to and expand in the bone marrow. Although 
initial transformation events leading to the development of multiple myeloma are thought 
to occur at a post-germinal center stage of development as suggested by the presence of 

25 somatic hypermutation of IGV genes, progress in understanding the biology and genetics 
of and advancing therapy for multiple myeloma has been slow. 

Multiple myeloma cells are endowed with a multiplicity of anti-apoptotic 
signaling mechanisms that account for their resistance to current chemotherapy and thus 
the ultimately fatal outcome for most patients. While aneuploidy by interphase 

30 fluorescence in situ hybridization (FISH) and DNA flow cytometry are observed in 
>90% of cases, cytogenetic abnormalities in this typically hypoproliferattve tumor are 
informative in only about 30% of cases and are typically complex, involving on average 7 
different chromosomes. Given this "genetic chaos" it has been difficult to establish 



WO 03/053215 



PCT/US02/35724 



correlations between genetic abnormalities and clinical outcomes. Only recently has 
chromosome 13 deletion been identified as a distinct clinical entity with a grave 
prognosis. However, even with the most comprehensive analysis of laboratory 
parameters, such as p2-microglobulin (p 2M), .C-reactive protein (CRP), plasma cell 
5 labeling index (PCLI), metaphase karyotyping, and FISH, the clinical course of patients 
afflicted with multiple myeloma can only be approximated, because no more than 20% of 
the clinical heterogeneity can be accounted for. Thus, there are distinct clinical subgroups 
of multiple myeloma and modern molecular tests may identify these entities. 

Monoclonal gammopathy of undetermined significance (MGUS) and 

10 multiple myeloma are the most frequent forms of monoclonal gammopathies. 
Monoclonal gammopathy of undetermined significance is the most common plasma cell 
dyscrasia with an incidence of up to 10% of population over age 75. The molecular basis 
of monoclonal gammopathy of undetermined significance and multiple myeloma are not 
very well understood and it is not easy to differentiate the two disorders. The diagnosis 

15 of multiple myeloma or monoclonal gammopathy of undetermined significance is 
identical in 2/3 of cases using classification systems that are based on a combination of 
clinical criteria such as the amount of bone marrow plasmocytosis, the concentration of 
monoclonal immunoglobulin in urine or serum, and the presence of bone lesions. 
Especially in early phases of multiple myeloma, the differential diagnosis is associated 

2 0 with a certain degree of uncertainty. 

Furthermore, in the diagnosis of multiple myeloma, the clinician must 
exclude other disorders in which a plasma cell reaction may occur such as rheumatoid 
arthritis and connective tissue disorders, or metastatic carcinoma where the patient may 
have osteolytic lesions associated with bone metastases. Therefore, given that multiple 

2 5 myeloma is thought to have an extended latency and clinical features are recognized many 
years after the development of the malignancy, new molecular diagnostic techniques are 
needed in screening for the disease and provide differential diagnosis for multiple 
myeloma, e.g., monoclonal gammopathy of undetermined significance versus multiple 
myeloma or the recognition of various subtypes of multiple myeloma. 

30 Thus, the prior art is deficient in methods of differential diagnosing and 

identifying distinct and prognostically relevant clinical subgroups of multiple myeloma. 
The present invention fulfills this long-standing need and desire in the art. 
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SUMMARY OF THE INVENTION 

Bone marrow plasma cells from 74 patients with newly diagnosed 
multiple myeloma, 5 with monoclonal gammopathy of undetermined significance 
5 (MGUS), and 31 normal volunteers (normal plasma cells) were purified by CD138 + 
selection. Gene expression of purified plasma cells and 7 multiple myeloma cell lines 
were profiled using high-density oligonucleotide microarrays interrogating -6,800 genes. 
On hierarchical clustering analysis, normal and multiple myeloma plasma cells were 
differentiated and four distinct subgroups of multiple myeloma (MM1, MM2, MM3 

10 and MM4) were identified. The expression patterns of MM1 was similar to normal 
plasma cells and monoclonal gammopathy of undetermined significance, whereas MM4 
was similar to multiple myeloma cell lines. Clinical parameters linked to poor prognosis 
such as abnormal karyotype (P =0.0003) and high serum p2-microglobulin levels (P = 
0.0004) were most prevalent in MM4. Genes involved in DNA metabolism and cell 

1 5 cycle control were overexpressed in a comparison of MM1 and MM4. 

Using chi square and Wilcoxon rank sum tests, 120 novel candidate 
disease genes that discriminated between normal and malignant plasma cells (P < .0001) 
were identified. Many of these candidate genes are involved in adhesion, apoptosis, cell 
cycle, drug resistance, growth arrest, oncogenesis, signaling and transcription. In 

20 addition, a total of 156 genes, including FGFR3 and CCND1, exhibited highly elevated 
("spiked") expression in at least 4 of the 74 multiple myeloma cases (range of spikes: 4 
to 25). Elevated expression of FGFR3 and CCND1 were caused by the translocation 
t(4;14)(pl6;q32) or t(ll;14)(ql3;q32). 

The present invention also identifies, through multivariate stepwise 

25 discriminant analysis, a minimum subset of genes whose expression is intimately 
associated with the malignant features of multiple myeloma. Fourteen genes were 
defined as predictors that are able to differentiate plasma cells of multiple myeloma 
patients from normal plasma cells with a high degree of accuracy, and 24 genes were 
identified as predictors that are able to differentiate the distinct subgroups of multiple 

30 myeloma (MM1, MM2, MM3 and MM4) described herein. 

Furthermore, data disclosed herein indicated that multiple myeloma can be 
placed into a developmental schema parallel to that of normal plasma cell differentiation. 
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Based on gene expression profiling, the MM4, MM3 and MM2 subgroups described 
above were found to have similarity with tonsil B cells, tonsil plasma cells and bone 
marrow plasma cells respectively. These data suggest that the enigmatic multiple 
myeloma is amendable to a gene expression/development stage-based classification 
5 system. 

In one aspect of the present invention, there are provided methods of 
using DNA microarray and hierarchical clustering analysis to classifiy subgroups of 
multiple myeloma, identify genes with elevated expression in subsets of multiple 
myeloma patients, and identify potential therapeutic targets for multiple myeloma. 
10 In another aspect of the present invention, there are provided methods of 

identifying groups of genes that can either differentiate plasma cells of multiple myeloma 

c 

patients from normal plasma cells, or distinguish between distinct subgroups of multiple 
myeloma. 

In still another aspect of the present invention, there are provided 
1 5 methods of diagnosis for multiple myeloma or subgroups of multiple myeloma based on 
the expression of a group of 14 genes or a group of 24 genes respectively. 

In yet another aspect of the present invention, there are provided methods 
of treatment for multiple myeloma. Such methods involve inhibiting or enhancing the 
expression of genes that are found to be over-expressed or down-regulated respectively in 
20 multiple myeloma patients as disclosed herein. 

The present invention also provides a method of developmental stage- 
based classification for multiple myeloma that is based on gene expression profiling 
between multiple myeloma cells and normal B or plasma cells. 

Other and further aspects, features, arid advantages of the present 
25 invention will be apparent from the following description of the presently preferred 
embodiments of the invention. These embodiments are given for the purpose of 
disclosure. 



BRIEF DESCRIPTION OF THE DRAWINGS 

30 

So that the matter in which the above-recited features, advantages and 
objects of the invention as well as others which will become clear are attained and can be 
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understood in detail, more particular descriptions and certain embodiments of the 
invention briefly summarized above are illustrated in the appended drawings. These 
drawings form a part of the specification. It is to be noted, however, that the appended 
drawings illustrate preferred embodiments of the invention and therefore are not to be 
5 considered limiting in their scope. 

Figure 1A shows cluster-ordered data table. The clustering is presented 
graphically as a colored image. Along the vertical axis, the analyzed genes are arranged as 
ordered by the clustering algorithm. The genes with the most similar patterns of 
expression are placed adjacent to each other. Likewise, along the horizontal axis, 

1 0 experimental samples are arranged; those with the most similar patterns of expression 
across all genes are placed adjacent to each other. Both sample and gene groupings can be 
further described by following the solid lines (branches) that connect the individual 
components with the larger groups. The color of each cell in the tabular image represents 
the expression level of each gene, with red representing an expression greater than the 

1 5 mean, green representing an expression less than the mean, and the deeper color intensity 
representing a greater magnitude of deviation from the mean. 

Figure IB shows amplified gene cluster showing genes downregulated in 
MM. Most of the characterized and sequence-verified cDNA-encoded genes are known 
to be immunoglobulins. 

20 Figure 1C shows cluster enriched with genes whose expression level was 

correlated with tumorigenesis, cell cycle, and proliferation rate. Many of these genes 
were also statistically significantly upregulated in multiple myeloma (% 2 and WRS test) 
(see Table 5). 

Figure ID shows dendrogram of hierarchical cluster. 74 cases of newly 
25 diagnosed untreated multiple myeloma, 5 monoclonal gammopathy of undetermined 
significance, 8 multiple myeloma cell lines, and 31 normal bone marrow plasma cell 
samples clustered based on the correlation of 5,483 genes (probe sets). Different-colored 
branches represent normal plasma cell (green), monoclonal gammopathy of undetermined 
significance (blue arrow), multiple myeloma (tan) and multiple myeloma cell lines (brown 
30 arrow). 

Figure IE shows dendrogram of a hierarchical cluster analysis of 74 cases 
of newly diagnosed untreated multiple myeloma alone (clustergram note shown). Two 



WO 03/053215 



PCT/US02/35724 



major branches contained two distinct cluster groups. The subgroups under the right 
branch, designated MM1 (light blue) and MM2 (blue) were more related to the 
monoclonal gammopathy of undetermined significance cases in Figure ID. The two 
subgroups under the left branch, designated MM3 (violet) and MM4 (red) represent 
5 samples that were more related to the multiple myeloma cell lines in Figure ID. 

Figure 2 shows the spike profile distributions of FGFR3, CST6, IFI27, 
and CCND1 gene expression. The normalized average difference (AD) value of 
fluorescence intensity of streptavidin-phycoerythrin stained biotinylated cRNA as 
hybridized to probes sets is on the vertical axis and samples are on the horizontal axis. 
10 The samples are ordered from left to right: normal plasma cells (NPCs) (green), MM1 
(light blue), MM2 (dark blue), MM3 (violet), and MM4 (red). Note relatively low 
expression in 31 plasma cells and spiked expression in subsets of multiple myeloma 
samples. The P values of the test for significant nonrandom spike distributions are 
noted. 

1 5 Figure 3A shows GeneChip HuGeneFL analysis of MS4A2 (CD20) gene 

expression. The normalized average difference (AD) value of fluorescence intensities of 
streptavidin-phycoerythrin stained biotinylated cRNA as hybridized to two independent 
probes sets (accession numbers M27394 (blue) and XI 2530 (red) located in different 
regions of the MS4A2 gene is on the vertical axis and samples are on the horizontal axis. 

20 Note relatively low expression in 31 normal plasma cells (NPCs) and spiked expression 
in 5 of 74 multiple myeloma samples (multiple myeloma plasma cells). Also note 
similarity in expression levels detected by the two different probe sets. 

Figure 3B shows immunohistochemistry for CD20 expression on clonal 
multiple myeloma plasma cells: (1) bone marrow biopsy section showing asynchronous 

25 type multiple myeloma cells (H&E, x500); (2) CD20 + multiple myeloma cells (xlOO; 
inset x500); (3) biopsy from a patient with mixed asynchronous and Marschalko-type 
multiple myeloma cells (H&E, x500); and (4) CD20 + single lymphocyte and CD20" 
multiple myeloma cells (x200). CD20 immunohistochemistry was examined without 
knowledge of clinical history or gene expression findings. 

3 0 Figure 4 shows the gene expression correlates with protein expression. 

Gene and protein expression of CD markers known to be differentially expressed during 
B-cell differentiation were compared between the multiple myeloma cell line CAG (left 
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panel) and the Epstein-Barr virus (EBV) transformed B-lymphoblastoid line ARH-77 
(right panel). In both panels, the 8 CD markers are listed in the left column of each 
panel. Flow cytometric analysis of protein expression is presented in the second 
column; the average difference (AD) and absolute call (AC) values of gene expression are 
5 presented in the third and fourth columns. Note the strong expression of both the gene 
and protein for CD 138 and CD38 in the CAG cells but the low expression in the ARH- 
77 cells. The opposite correlation is observed for the remaining markers. 

Figure 5 shows multivariate discriminant analysis of 14 features of all 
normal plasma cells, MMs, monoclonal gammopathy of undetermined significance and 
1 0 multiple myeloma cell lines. This scatterplot resulted from the orthogonal projection of 
value per case onto the plane defined by the 2 centers. The green plots represent normal 
plasma cells; the blue plots represent multiple myeloma plasma cells and multiple 
myeloma cell lines; the pink plots represent monoclonal gammopathy of undetermined 
significance. 

1 5 Figure 6A shaws 269 cases of multiple myeloma, 7 multiple myeloma 

cell lines, 7 monoclonal gammopathy of undetermined significance and 32 normal plasma 
cells samples clustered based on the correlation of 5,483 genes (probe sets). Two major 
branches contained two distinct cluster groups. The subgroup including normal plasma 
cell samples contained 1 monoclonal gammopathy of undetermined significance (green 

20 arrow) and 2 misclassified multiple myeloma samples (pink arrow). Figure 6B shows 
amplified sample cluster showing samples connecting to the normal group. 

Figure 7 shows multivariate discriminant analysis of 24 features of all 
multiple myeloma, monoclonal gammopathy of undetermined significance and multiple 
myeloma cell lines. This scatterplot resulted from the orthogonal projection of value per 

25 case onto the plane defined by the 4 centers. The red plots represent the MM1 
subgroup; the green plots represent the MM2 subgroup; the blue plots represent the 
MM3 subgroup; and the pink plots represent the MM4 subgroup; the light blue plots 
are ungroup cases; and the large yellow plots represent the group centers. 

Figure 8A shows endothelin B receptor (ENDBR) expression in normal 

30 plasma cells and in approximately 200 myeloma patients starting with PI through P226 
as indicated by the mean fluorescent intensity of the microarry data depicted on the Y 
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axis. Figure 8B shows endothelin B receptor expression in normal plasma cells and in 
newly diagnosed myeloma patients. 

Figure 9A shows the expression of endothelin B receptor (ENDBR) in 
feeder cells and myeloma cells P323 and P322 before and after co-culture. Figure 9B 
5 shows the expression of endothelin 1 in feeder cells and myeloma cells P323 and P322 
before and after co-culture. 

Figure 10 shows flow cytometric, immunofluorescence and cytological 
analysis of normal B cell and plasma cell samples. 

CD19-Selected Tonsil B cells: Tonsil-derived mononuclear fractions were 

10 tested for percentage of tonsil B cells prior to anti-CD 19 immunomagnetic bead sorting 
by using two-color FACs analysis with antibodies to CD20/CD38 (a and b). The post- 
sorting purity of the tonsil B cell sample was determined by CD20/CD38 (c and d), 
CD138/CD20 (e), and CD138/CD38 (f) staining. Cytospin preparations of the purified 
tonsil B cell samples were stained with Wright Giemsa and cell morphology observed 

1 5 with light microscopy (g). Purifed B cells were also stained with AMCA and FITC 
antibodies against cytoplasmic immunoglobulin (clg) light chain (k and I) and observed 
by immunofluorescence microscopy (h). Note the lack of clg staining and thus minimal 
plasma cell contamination in the tonsil B cell fraction. 

CD138-Selected Tonsil Plasma Cells: Tonsil mononuclear fractions were 

20 tested for percentage of plasma cells prior to anti-CD138 immunomagnetic bead sorting 
by using two color FACs analysis using antibodies to CD38/CD45 (i) and CD138/CD45 
(j). The post-sorting purity of the tonsil plasma cell samples was determined by dual 
color FACs analysis of CD38/CD45 (k), CD138/CD45 (1), CD38/CD20 (m), and 
CD138/CD38 (n). Cytospin preparations of the purified tonsil plasma cells were 

25 analyzed for morphological appearance (o) and clg (p): 

CD138-Selected Bone Marrow Plasma Cells: Mononuclear fractions from 
bone marrow aspirates were tested for percentage of plasma cells prior to anti-CD 138 
immunomagnetic bead sorting by using two color FACs analysis using antibodies to 
CD38/CD45 (q) and CD138/CD45 (r). The post sorting purity of the bone marrow 

30 plasma cell sample was determined by dual color FACs analysis of CD38/CD45 (s), 
CD138/CD45 (t), CD38/CD20 (u), and CD138/CD38 (v). Cytospin preparations of the 
purified bone marrow plasma cells were analyzed for morphological appearance (w) and 
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clg (x). Note the high percentage of clg-positive bone marrow plasma cells with clear 
plasma cell morphologic characteristics. 

Figure 11 shows two-dimensional hierarchical cluster analysis of normal 
human plasma cells. Included were 7 tonsil BC (TBC), 7 tonsil PC (TPC), and 7 bone 
5 marrow PC (BPC) samples clustered based on the correlation of experimental expression 
profiles of 4866 probe sets. The clustering is presented graphically as a colored image. 
Along the vertical axis, the analyzed genes are arranged as ordered by the clustering 
algorithm. The genes with the most similar patterns of expression are placed adjacent to 
each other. Experimental samples are similarly arranged in the horizontal axis. The color 

10 of each cell in the tabular image represents the expression level of each gene, with red 
representing an expression greater than the mean, green representing an expression less 
than the mean, and the deeper color intensity representing a greater magnitude of 
deviation from the mean. The top dendrogram produces two major branches separating 
tonsil BCs from PCs. In addition, within the PC cluster, tonsil PCs and bone marrow 

1 5 PCs are separated on three unique branches. 

Figure 12 shows two-dimensional hierarchical cluster analysis of 
experimental expression profiles and gene behavior of 30 EDG-MM. B cells, tonsil and 
bone marrow plasma cells, and multiple myeloma (MM) samples were analyzed using a 
cluster-ordered data table. The tonsil B cell, tonsil plasma cell, bone marrow plasma cell 

20 samples are indicated by red, blue, and golden bars respectively. The nomenclature for 
the 74 MM samples is as indicated in Zhan et al. (2002). Along the vertical axis, the 
analyzed genes are arranged as ordered by the clustering algorithm. The genes with the 
most similar patterns of expression are placed adjacent to each other. Both sample and 
gene groupings can be further described by following the solid lines (branches) that 

25 connect the individual components with the larger groups. The tonsil B cell cluster is 
identified by the horizontal red bar. The color of each cell in the tabular image represents 
the expression level of each gene, with red representing an expression greater than the 
mean, green representing an expression less than the mean, and the deeper color intensity 
representing a greater magnitude of deviation from the mean. 

30 Figure 13 shows two-dimensional hierarchical cluster analysis of 

experimental expression profiles and gene behavior of 50 LDG-MM1 genes. Genes are 
plotted along the vertical axis (right side), and experimental samples are plotted along the 
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top horizontal axis by their similarity. The tonsil plasma cell cluster is identified by a 
horizontal blue bar. Tonsil B cell, tonsil plasma cell, and bone marrow plasma cell 
samples are indicated as in Figure 12. 

Figure 14 shows two-dimensional hierarchical cluster analysis of 
5 experimental expression profiles and gene behavior of 50 LDG-MM2 genes. Genes are 
plotted along the vertical axis (right side), and experimental samples are plotted along the 
top horizontal axis by their similarity. The bone marrow plasma cell cluster is identified 
by a horizontal golden bar. Tonsil B cell, tonsil plasma cell, and bone marrow plasma cell 
samples are indicated as in Figure 12. 

1 0 Figure 15 shows variation in expression of proliferation genes reveals 

similarities between tonsil B cells and MM4. The data are shown as boxplot of Kruskal- 
Wallis test values. The seven groups analyzed (tonsil B cells, tonsil plasma cells, bone 
marrow plasma cells, and gene expression defined subgroups MM1, MM2, MM3, and 
MM4) are distributed along the x-axis and the natural log transformed average difference 

1 5 is plotted on the y axis. EZH2; P = 7.61xl0" n ; KNSL1, P = 3.21X10" 8 ; PRKDC , P = 
2.86XKT 11 ; SNRPC , P = 5.44xl(r 12 ; CCNBl P = 2.54XKT 8 ; CKS2, P = 9.49xl(r n ; 
CKS1, P = 5.86xl(T 9 ; PRIM1, P = 4.25xl(T 5 

Figure 16 shows the receiver operating characteristic (ROC) curves for 
the multiple myeloma (MM) vs monoclonal gammopathy of undetermined significance 

2 0 (MGUS) classification. 



DETAILED DESCRIPTION OF THE INVENTION 

25 There is now strong evidence that global gene expression profiling can 

reveal a molecular heterogeneity of similar or related hematopoietic malignancies that 
have been difficult to distinguish. The most significantly differentially expressed genes in 
a comparison of normal and malignant cells can be used in the development of clinically 
relevant diagnostics as well as provide clues into the basic mechanisms of cellular 

3 0 transformation. In fact, these profiles might even be used to identify malignant cells even 
in the absence of any clinical manifestations. In addition, the biochemical pathways in 
which the products of these genes act may be targeted by novel therapeutics. 

10 
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The present invention demonstrates that both normal and malignant 
plasma cells can be purified to homogeneity from bone marrow aspirates using anti- 
CD 13 8-based immunomagnetic bead-positive selection. Using these cells, the present 
invention provides the first comprehensive global gene expression profiling of newly 
5 diagnosed multiple myeloma patients and contrasted these expression patterns with 
those of normal plasma cells. Novel candidate multiple myeloma disease genes were 
identified using the method of gene expression profiling disclosed herein and this profiling 
has lead to the development of a gene-based classification system for multiple myeloma. 

Results from hierarchical cluster analysis on multiple myeloma and normal 

10 plasma cells, as well as the benign plasma cell dyscrasia monoclonal gammopathy of 
undetermined significance and the end-stage-like multiple myeloma cell lines revealed 
normal plasma cells are unique and that primary multiple myeloma is either like 
monoclonal gammopathy of undetermined significance or multiple myeloma cell lines. In 
addition, multiple myeloma cell line gene expression was homogeneous as evidenced by 

15 the tight clustering in the hierarchical analysis. The similarity of multiple myeloma cell 
line expression patterns to primary newly diagnosed forms of multiple myeloma support 
the validity of using multiple myeloma cell lines as models for multiple myeloma. 

Upon hierarchical clustering of multiple myeloma alone, four distinct 
clinical multiple myeloma subgroups (MM1 to MM4) were distinguished. The MM1 

20 subgroup contained samples that were more like monoclonal gammopathy of 
undetermined significance, whereas the MM4 subgroup contained samples more like 
multiple myeloma cell lines. The most significant gene expression patterns differentiating 
MM1 and MM4 were cell cycle control and DNA metabolism genes, and the MM4 
subgroup was more likely to have abnormal cytogenetics, elevated serum _2M, elevated 

25 creatinine, and deletions of chromosome 13. These are important variables that 
historically have been linked to poor prognosis. 



Gene Expression Changes in Multiple Myeloma 

Data disclosed herein indicated that the MM4 subgroup likely represents 
30 the most high-risk clinical entity. Thus, knowledge of the molecular genetics of this 
particular subgroup should provide insight into its biology and possibly provide a 
rationale for appropriate subtype-specific therapeutic interventions. The most 

11 
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significant gene expression changes differentiating the MM1 and MM4 subgroups code 
for activities that clearly implicate MM4 as having a more proliferative and autonomous 
phenotype. The most significantly altered gene in the comparison, TYMS (thymidylate 
synthase), which functions in the prymidine biosynthetic pathway, has been linked to 
5 resistance to fluoropyrimidine chemotherapy and also poor prognosis in colorectal 
carcinomas. Other notable genes upregulated in MM4 were the CAAX 
farnesyltransferase gene, FTNA. Farnesyltransferase prenylates RAS, a post translational 
modification required to allow RAS to attach to the plasma membrane. These data 
suggest that farnesyltransferase inhibitors may be effective in treating patients with high 

1 0 levels of FTNA expression. 

Two other genes coding for components of the proteasome pathway, 
POHl (26S proteasome-associated padl homolog) and UBL1 (ubiquitin-like protein 1) 
were also overexpressed in MM4. Overexpression of POHl confers P-glycoprotein- 
independent, pleotropic drug resistance to mammalian cells. UBL1, also known as 

15 sentrin, is involved in many processes including associating with RAD51, RAD52, and 
p53 proteins in the double-strand repair pathway; conjugating with RANGAP1 involved 
in nuclear protein import; and importantly for multiple myeloma, protecting against both 
Fas/Apo-1 (TNFRSF6) or TNFR1 -induced apoptosis. In contrast to normal plasma cells, 
more than 75% of multiple myeloma plasma cells express abundant mRNA for the 

20 multidrug resistance gene, lung-resistance-related protein (MVP). These data are 
consistent with previous reports showing that expression of MVP in multiple myeloma is 
a poor prognostic factor. Given the uniform development of chemotherapy resistance in 
multiple myeloma, the combined overexpression of POHl and MVP may have profound 
influences on this phenotype. The deregulated expression of many genes whose 

25 products function in the proteasome pathway may be used in the pharmacogenomic 
analysis of efficacy of proteasome inhibitors like PS-341 (Millennium Pharmaceuticals, 
Cambridge, MA). 

Another significantly upregulated gene in MM4 was the single stranded 
DNA-dependent ATP-dependent helicase {G22P1), which is also known as Ku70 
30 autoantigen. The DNA helicase II complex, made up of p70 and p80, binds 
preferentially to fork-like ends of double-stranded DNA in a cell cycle-dependent 
manner. Binding to DNA is thought to be mediated by p70 and dimerization with p80 

12 
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forms the ATP-dependent DNA-unwinding enzyme (helicase II) and acts as the 
regulatory component of a DNA-dependent protein kinase (DNPK) which was also 
significantly upregulated in MM4. The involvement of the helicase II complex in DNA 
double-strand break repair, V(D)J recombination, and notably chromosomal 
5 translocations has been proposed. Another gene upregulated was the DNA 
fragmentation factor (DFFA). Caspase-3 cleaves the DFFA-encoded 45 kD subunit at 
two sites to generate an active factor that produces DNA fragmentation during apoptosis 
signaling. In light of the many blocks to apoptosis in multiple myeloma, DFFA 
activation could result in DNA fragmentation, which in turn would activate the helicase II 

1 0 complex that then may facilitate chromosomal translocations. It is of note that abnormal 
karyotypes, and thus chromosomal translocations, are associated with the MM 4 
subgroup which tended to overexpress these two genes. 

Hence, results disclosed herein demonstrate that direct comparison of gene 
expression patterns in multiple myeloma and normal plasma cells can identified novel 

15 genes that could represent the fundamental changes associated with the malignant 
transformation of plasma cells. 

The progression of multiple myeloma as a hypoproliferative tumor is 
thought to be linked to a defect in programmed cell death rather than rapid cell 
replication. Two genes, prohibitin (PHB) and quiescin Q6 (QSCN6), overexpressed in o. 

20 multiple myeloma are involved in growth arrest. The overexpression of these genes may 
be responsible for the typically low proliferation indices seen in multiple myeloma. It is 
hence conceivable that therapeutic downregulation of these genes that results in enhanced 
proliferation could render multiple myeloma cells more susceptible to cell cycle-active 
chemotherapeutic agents. 

25 The gene coding for CD27, TNFRSF7, the second most significantly 

underexpressed gene in multiple myeloma, is a member of the tumor necrosis factor 
receptor (TNFR) superfamily that provides co-stimulatory signals for T and B cell 
proliferation and B cell immunoglobulin production and apoptosis. Anti-CD27 
significantly inhibits the induction of Blimp- 1 and J-chain transcripts which are turned 

30 on in cells committed to plasma cell differentiation, suggesting that ligation of CD27 on B 
cells may prevent terminal differentiation. CD27 ligand (CD70) prevents IL-10-mediated 
apoptosis and directs differentiation of CD27 + memory B cells toward plasma cells in 
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cooperation with IL-10. Thus, it is possible that the downregulation of CD27 gene 
expression in multiple myeloma may block an apoptotic program. 

The overexpression of CD47 on multiple myeloma may be related to 
escape of multiple myeloma cells from immune surveillance. Studies have shown that 
5 cells lacking CD47 are rapidly cleared from the bloodstream by splenic red pulp 
macrophages and CD47 on normal red blood cells prevents this elimination. 

The gene product of DNA methyltransferase l y DNMTl, overexpressed in 
multiple myeloma, is responsible for cytosine methylation in mammals and has an 
important role in epigenetic gene silencing. In fact, aberrant hyperrnethylation of tumor 

1 0 suppressor genes plays an important role in the development of many tumors. De novo 
methylation of pl6/INK4a is a frequent finding in primary multiple myeloma. Also, 
recent studies have shown that upregulated expression of DNMTs may contribute to the 
pathogenesis of leukemia by inducing aberrant regional hyperrnethylation. DNA 
methylation represses genes partly by recruitment of the methyl-CpG-binding protein 

1 5 MeCP2, which in turn recruits a histone deacetylase activity. It has been shown that the 
process of DNA methylation, mediated by Dnmtl, may depend on or generate an altered 
chromatin state via histone deacetylase activity. It is potentially significant that multiple 
myeloma cases also demonstrate significant overexpression of the gene for metastasis- 
associated 1 (MTAl). MTA1 was originally identified as.' being highly expressed in 

20 metastatic cells. MTA1 has more recently been discovered to be one subunit of the 
NURD (NUcleosome Remodeling and histone Deacetylation) complex which contains 
not only ATP-dependent nucleosome disruption activity, but also histone deacetylase 
activity. Thus, over expression of DNMT1 and MTA1 may have dramatic effects on 
repressing gene expression in multiple myeloma. 

25 Oncogenes activated in multiple myeloma included ABL and MYC, 

Although it is not clear whether ABL tyrosine kinase activity is present in multiple 
myeloma, it is important to note that overexpression of abl and c-myc results in the 
accelerated development of mouse plasmacytomas. Thus, it may be more than a 
coincidence that multiple myeloma cells significantly overexpresses MYC and ABL 

30 Chromosomal translocations involving the MYC oncogene and IGH and 

IGL genes that result in dysregulated MYC expression are hallmarks of Burkitt's 
lymphoma and experimentally induced mouse plasmacytomas; however, MYC/IGH- 



WO 03/053215 



PCT/US02/35724 



associated translocations are rare in multiple myeloma. Although high MYC expression 
was a common feature in our panel of multiple myeloma, it was quite variable, ranging 
from little or no expression to highly elevated expression. It is also of note that the MAZ 
gene whose product is known to bind to and activate MYC expression was significantly 
5 upregulated in the MM4 subgroup. Given the important role of MYC in B cell neoplasia, 
it is speculated that overexpression of MYC, and possibly ABL, in multiple myeloma 
may have biological and possibly prognostic significance. 

EXT1 and EXT2, which are tumor suppressor genes involved in hereditary 
multiple exostoses, heterodimerize and are critical in the synthesis and display of cell 

1 0 surface heparan sulfate glycosaminoglycans (GAGs). EXTI is expressed in both multiple 
myeloma and normal plasma cells. EXT2L was overexpressed in multiple myeloma, 
suggesting that a functional glycosyltransferase could be created in multiple myeloma. It 
is of note that syndecan-1 (CDJ38/SDC1), a transmembrane heparan sulfate 
proteoglycan, is abundantly expressed on multiple myeloma cells and, when shed into 

1 5 the serum, is a negative prognostic factor. Thus, abnormal GAG-modified SDC1 may be 
important in multiple myeloma biology. The link of SDC1 to multiple myeloma biology 
is further confirmed by the recent association of SDC1 in the signaling cascade induced 
by the WNT proto-oncogene products. It has been showed that syndecan-1 (SDC1) is 
required for Wnt-1 -induced mammary tumorigenesis. Data disclosed herein indicated a 

20 significant downregulation of WNT I OB in primary multiple myeloma cases. It is also of 
note that the WNT5A gene and the FRZB gene, which codes for a decoy WNT receptor, 
were also marginally upregulated in newly diagnosed multiple myeloma. Given that the 
WNTs represent a novel class of B cell regulators, deregulation of the expression of these 
growth factors (WNTS A, WNT10B) and their receptors (e.g., FRZB) and genes products 

25 that modulate receptor signaling (e.g., SDC1\ may be important in the genesis of 
multiple myeloma. 

The present invention also identifies, through multivariate stepwise 
discriminant analysis, a minimum subset of genes whose expression is intimately 
associated with the malignant features of multiple myeloma. By applying linear 

3 0 regression analysis to the top 50 statistically significant differentially expressed genes, 14 
genes were defined as predictors that are able to differentiate multiple myeloma from 
normal plasma cells with a high degree of accuracy. When the model was applied to a 
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validation group consisting of 118 multiple myeloma, 6 normal plasma cells and 7 cases 
of MGUS, an accuracy of classification of more than 99% was achieved. Importantly, 6 
of the 7 MGUS cases were classified as multiple myeloma, indicating that MGUS has 
gene expression features of malignancy. Thus the altered expression of 14 genes out of 
5 over 6,000 genes interrogated are capable of defining multiple myeloma. Similar 
multivariate discriminant analysis also identified a set of 24 genes that can distinguish 
between the four multiple myeloma subgroups described above. 

In addition to identifying genes that were statistically different between 
the group of normal plasma cells and multiple myeloma plasma cells, the present 

10 invention also identified genes, like FGFR3 and CCNDI, that demonstrate highly 
elevated "spiked" expression in subsets of multiple myelomas. Patients with elevated 
expression of these genes can have significant distribution differences among the four 
gene expression cluster subgroups. For example, FGFR3 spikes are found in MM1 and 
MM2 whereas spikes of IFI27 are more likely to be found in MM3 and MM4. Highly 

1 5 elevated expression of the interferon-induced gene IFI27 may be indicative of a viral 
infection, either systemic or specifically within the plasma cells from these patients. 
Correlation analysis has shown that IFI27 spikes are significantly linked (Pearson 
correlation coefficient values of .77 to .60) to elevated expression of 14 interferon- 
induced genes, including MX1, MX2 y OAS1, QAS2, IFIT1, IFIT4, PLSCR1, and STATL 

20 More recent analysis of a large population of multiple myeloma patients (N = 280) 
indicated that nearly 25% of all patients had spikes of the IFI27 gene. It is of interest to 
determine whether or not the IFI27 spike patients who cluster in the MM4 subgroup are 
more likely to have a poor clinical course and to identify the suspected viral infection 
causing the upregulation of this class of genes. Thus, spiked gene expression may also be 

2 5 used in the development of clinically relevant prognostic groups. 

Finally, the 100% coincidence of spiked FGFR3 or CCND1 gene 
expression with the presence of the t(4;14)(pl4;q32) or t(ll;14)(ql3;q32) translocations, 
as well as the strong correlations between protein expression and gene expression 
represent important validations of the accuracy of gene expression profiling and suggests 

3 0 that gene expression profiling may eventually supplant the labor intensive and expensive 

clinical laboratory procedures, such as cell surface marker immunophenotyping and 
molecular and cellular cytogenetics. 
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Genes identified by the present invention that shows significantly up- 
regulated or down-regulated expression in multiple myeloma are potential therapeutic 
targets for multiple myeloma. Over-expressed genes may be targets for small molecules 
or inhibitors that decrease their expression. Methods and materials that can be used to 
5 inhibit gene expression, e.g. small drug molecules; anti-sense oligo, or antibody would be 
readily apparent to a person having ordinary skill in this art. On the other hand, under- 
expressed genes can be replaced by gene therapy or induced by drugs. 

Comparison of Multiple Myeloma with Normal Plasma Cell Development 

1 0 Data disclosed herein indicated that multiple myeloma can be placed into a 

developmental schema parallel to that of normal plasma cell differentiation. Global gene 
expression profiling reveals distinct changes in transcription associated with human 
plasma cell differentiation. Hierarchical clustering analyses with 4866 genes segregated 
tonsil B cells, tonsil plasma cells, and bone marrow plasma cells. Combining % 2 

1 5 Wilcoxon rank sum tests, 359 previously defined and novel genes significantly (P 
O.0005) discriminated tonsil B cells from tonsil plasma cells, and 500 genes significantly 
discriminated tonsil plasma cells from bone marrow plasma cells. Genes that were 
significantly differentially expressed in the tonsil B cell to tonsil plasma cell" transition 
vyere referred as "early differentiation genes" (EDGs) and those differentially expressed & 

20 in the tonsil plasma cell to bone marrow plasma cell transition were referred as "late 
differentiation genes" (LDGs). One-way ANOVA was then applied to EDGs and LDGs 
to identify statistically significant expression differences between multiple myeloma 
(MM) and tonsil B cells (EDG-MM), tonsil plasma cells (LDG-MM1), or bone marrow 
plasma cells (LDG-MM2). 

25 Hierarchical cluster analysis revealed that 13/18 CP=00005) MM4 cases 

(a putative poor-prognosis subtype) clustered tightly with tonsil B cells. The other 
groups (MM1 , 2 and 3) failed to show such associations. In contrast, there was tight 
clustering between tonsil plasma cells and 14/15 (^.OOOOl) MM3, and significant 
similarities between bone marrow plasma cells and 14/20 (P=00009) MM2 cases were 

3 0 found. MM1 showed no significant linkage with the normal cell types studied. In 
addition, XBP1, a transcription factor essential for plasma cell differentiation, exhibited a 
significant, progressive reduction in expression from MM1 to MM4, consistent with 
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developmental-stage relationships. Therefore, global gene expression patterns linked to 
late-stage B cell differentiation confirmed and extended a global gene expression-defined 
classification system of multiple myeloma, suggesting that multiple myeloma represents 
a constellation of distinct subtypes of disease with unique origins. 
5 The present invention is drawn to a method of gene-based classification 

for multiple myeloma. Nucleic acid samples of isolated plasma cells derived from 
individuals with or without multiple myeloma were applied to a DNA microarray, and 
hierarchical clustering analysis performed on data obtained from the microarray will 
classify the individuals into distinct subgroups such as the MM1, MM2, MM3 and 

1 0 MM4 subgroups disclosed herein. 

In another embodiment of the present invention, there is provided a 
method of identifying genes with elevated expression in subsets of multiple myeloma 
patients. Nucleic acid samples of isolated plasma cells derived from individuals with 
multiple myeloma were applied to a DNA microarray, and hierarchical clustering analysis 

15 performed on data obtained from the microarray will identify genes with elevated 
expression in subsets of multiple myeloma patients. Representative examples of these 
genes are listed in Table 8. 

In another embodiment of the present invention, there is provided a 
method of identifying potential therapeutic targets for multiple myeloma. Nucleic acid 

20 samples of isolated plasma cells derived from individuals with or without multiple 
myeloma were applied to a DNA microarray, and hierarchical clustering analysis was 
performed on data obtained from the microarray. Genes with significantly different 
levels of expression in multiple myeloma patients as compared to normal individuals are 
potential therapeutic targets for multiple myeloma. Representative examples of these 

2 5 genes are listed in Tables 4, 5 and 8. 

In yet another embodiment of the present invention, there is provided a 
method of identifying a group of genes that can distinguish between normal plasma cells 
and plasma cells of multiple myeloma. Nucleic acid samples of isolated plasma cells 
derived from individuals with or without multiple myeloma were applied to a DNA 

3 0 microarray, and hierarchical clustering analysis was performed on data obtained from the 

microarray. Genes with statistically significant differential expression patterns were 
identified, and linear regression analysis was used to identify a group of genes that is 
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capable of accurate discrimination between normal plasma cells and plasma cells of 
multiple myeloma. Representative examples of these genes are listed in Table 6. 

In still yet another embodiment of the present invention, there is provided 
a method of identifying a group of genes that can distinguish between subgroups of 
5 multiple myeloma. Nucleic acid samples of isolated plasma cells derived from individuals 
with multiple myeloma were applied to a DNA microarray, and hierarchical clustering 
analysis was performed on data obtained from the microarray. Genes with statistically 
significant differential expression patterns were identified, and linear regression analysis 
was used to identify a group of genes that is capable of accurate discrimination between 
1 0 subgroups of multiple myeloma. Representative examples of these genes are listed in 
Table 7. 

In another embodiment of the present invention, there is provided a 
method of diagnosis for multiple myeloma. Expression levels of a group of 14 genes as 
listed in Table 6 were examined in plasma cells derived from an individual, wherein 
1 5 statistically significant differential expression would indicate that such individual has 
multiple myeloma. Gene expression levels can be examined at nucleic acid level or 
protein level according to methods well known to one of skill in the art. 

In yet another embodiment of the present invention, there is provided a 
method of diagnosis for subgroups of multiple myeloma. Expression levels of a group of 
20 24 genes as listed in Table 7 were examined in plasma cells derived from an individual, 
wherein statistically significant differential expression would provide diagnosis for 
subgroups of multiple myeloma. Gene expression levels can be examined at nucleic acid 
level or protein level according to methods well known to one of skill in the art. 

In another embodiment of the present invention, there are provided 

2 5 methods of treatment for multiple myeloma: Such methods involve inhibiting expression 

of one of the genes listed in Table 5 or Table 8, or increasing expression of one of the 
genes listed in Table 4. Methods of inhibiting or increasing gene expression such as those 
using anti-sense oligonucleotide or antibody are well known to one of skill in the art. 

The present invention is also drawn to a method of developmental stage- 

3 0 based classification for multiple myeloma. Nucleic acid samples of isolated B cells and 

plasma cells derived from individuals with or without multiple myeloma were applied to 
a DNA microarray, and hierarchical clustering analysis performed on data obtained from 
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the microarray will classify the multiple myeloma cells according to the developmental 
stages of normal B cells and plasma cells. In general, normal B cells and plasma cells are 
isolated from tonsil, bone marrow, mucoal tissue, lymph node or peripheral blood. 

The following examples are given for the purpose of illustrating various 
5 embodiments of the invention and are not meant to limit the present invention in any 
fashion. 

EXAMPLE 1 

Cell Isolation And Analysis 

1 0 Samples for the following studies included plasma cells from 74 newly 

diagnosed cases of multiple myeloma, 5 subjects with monoclonal gammopathy of 
undetermined significance, 7 samples of tonsil B lymphocytes (tonsil BCs), 1 1 samples 
of tonsil plasma cells (tonsil PCs), and 31 bone marrow PCs derived from normal healthy 
donor. Multiple myeloma cell lines (U266, ARP1, RPMI-8226, UUN, ANBL-6, CAG, 

1 5 and H929 (courtesy of P.L. Bergsagel) and an Epstein-Barr virus (EBV)-transformed B- 
lymphoblastoid cell line (ARH-77) were grown as recommended (ATCC, Chantilly, 
VA). 

Tonsils were obtained from patients undergoing tonsillectomy for chronic 
tonsillitis. Tonsil tissues were minced, softly teased and filtered. The mononuclear cell 

20 fraction from tonsil preparations and bone marrow aspirates were separated by a 
standard Ficoll-Hypaque gradient (Pharmacia Biotech, Piscataway, NJ). The cells in the 
light density fraction (S.G. <1.077) were resuspended in cell culture media and 10% fetal 
bovine serum, RBC lysed, and several PBS wash steps were performed. Plasma cell 
isolation was performed with anti-CD 138 immunomagnetic bead selection as previously 

25 described (Zhan et al., 2002). B lymphocyte isolation was performed using directly 
conjugated monoclonal mouse anti-human CD 19 antibodies and the AutoMacs 
automated cell sorter (Miltenyi-Biotec, Auburn, CA). 

For cytology, approximately 40,000 purified tonsil BC and PC 
mononuclear cells were cytocentrifuged at 1000 x g for 5 min at room temperature. For 

30 morphological studies, the cells were immediately processed by fixing and staining with 
DiffQuick fixative and stain (Dade Diagnostics, Aguada, PR). 
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For immunofluorescence, slides were treated essentially as described 
(Shaughnessy et al. 5 2000). Briefly, slides were air-dried overnight, then fixed in 100% 
ethanol for 5 min at room temperature and baked in a dry 37°C incubator for 6 hr. The 
slides were then stained with 100 \i\ of a 1:20 dilution of goat anti-human-kappa 
5 immunoglobulin light chain conjugated with 7-amino-4-methylcourmarin-3-acitic acid 
(AMCA) (Vector Laboratories, Burlingame, GA) for 30 min in a humidified chamber. 
After incubation, the slides were washed two times in 1 x PBS + 0.1% NP-40 (PBD). 
To enhance the AMCA signal, the slides were incubated with 100 \i\ of a 1:40 dilution of 
AMCA-labeled rabbit-anti-goat IgG antibody and incubated for 30 min at room 

1 0 temperature in a humidified chamber. Slides were washed 2 times in 1 X PBD. The 
slides were then stained with 100 fil of a 1:100 dilution of goat anti-human-lambda 
immunoglobulin light chain conjugated with FITC (Vector Laboratories, Burlingame, C A) 
for 30 min in a humidified chamber; the slides were washed two times in 1 x PBD. Then 
the slides were stained with propidium iodide at 0.1 (ig/ml in 1 x PBS for 5 min, washed 

15 in 1 x PBD, and 10 ^il anti-fade (Molecular Probes, Eugene, OR) was added and 
coverslips were placed. Cytoplasmic immunoglobulin light chain-positive PCs were 
visualized using , an Olympus BX60 epi-fluorescence microscope equipped with 
appropriate filters. The images were captured using a Quips XL genetic workstation 
(Vysis, Downers Grove, IL). 

20 Both unpurified mononuclear cells and purified fractions from tonsil BCs, 

tonsil PCs, and bone marrow PCs were subjected to flow cytometric analysis of CD 
marker expression using a panel of antibodies directly conjugated to FITC or PE. 
Markers used in the analysis included FITC-labeled CD20, PE-labeled CD38, FITC- 
labeled or ECD-labeled CD45, PE- or PC5-labled CD138 (Beckman Coulter, Miami, FL). 

25 For detection of CD138 on PCs after CD138 selection, we employed an indirect 
detection strategy using a FITC-labeled rabbit anti-mouse IgG antibody (Beckman 
Coulter) to detect the mouse monoclonal anti-CD 138 antibody BB4 used in the 
immunomagnetic selection technique. Cells were taken after Ficoll Hypaque gradient or 
after cell purification, washed in PBS, and stained at 4°C with CD antibodies or isotype- 

30 matched control Gl antibodies (Beckman Coulter); After staining, cells were 
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resuspended in 1 x PBS and analyzed using a Epics XL-MCL flow cytometry system 
(Beckman Coulter). 

EXAMPLE 2 

5 Preparation Of Labeled cRNA And Hybridization To High-Density Microarrav 

Total RNA was isolated with RNeasy Mini Kit (Qiagen, Valencia, CA). 
Double-stranded cDNA and biotinylated cRNA were synthesized from total RNA and 
hybridized to HuGeneFL GeneChip microarrays (Affymetrix, Santa Clara, CA), which 
were washed and scanned according to procedures developed by the manufacturer. The 
1 0 arrays were scanned using Hewlett Packard confocal laser scanner and visualized using 
Affymetrix 3.3 software (Affymetrix). Arrays were scaled to an average intensity of 
1,500 and analyzed independently. 

EXAMPLE 3 

15 Genechip Data Analysis 

To efficiently manage and mine high-density oligonucleotide DNA 
microarray data, a new data-handling tool was developed. GeneChip-derived expression 
data was stored on an MS SQL Server. This database was linked, via an MS Access 
interface called dlinical Gene-Organizer to multiple clinical parameter databases for 

20 multiple myeloma patients. This Data Mart concept allows gene expression profiles to 
be directly correlated with clinical parameters and clinical outcomes using standard 
statistical software. All data used in the present" analysis were derived from Affymetrix 
3.3 software. GeneChip 3.3 output files are given (1) as an average difference (AD) that 
represents the difference between the intensities of the sequence-specific perfect match 

25 probe set and the mismatch probe set, or (2) as an absolute call (AC) of present or absent 
as determined by the GeneChip 3.3 algorithm. Average difference calls were transformed 
by the natural log after substituting any sample with an average difference of <60 with 
the value 60 (2.5 times the average Raw Q). Statistical analysis of the data was 
performed with software packages SPSS 10.0 (SPSS, Chicago, IL), S-PIus 2000 

3 0 (Insightful Corp., Seattle, WA), and Gene Cluster/Treeview (Eisen et al., 1998). 

To differentiate four distinct subgroups of multiple myeloma (MM1, 
MM2, MM3 and MM4), hierarchical clustering of average linkage clustering with the 

22 



WO 03/053215 



PCT/US02/35724 



centered correlation metric was employed. The clustering was done on the average 
difference data of 5,483 genes. Either Chi square (x 2 ) or Fisher's exact test was used to 
find significant differences between cluster groups with the AC data. To compare the 
expression levels, the non-parametric Wilcoxon rank sum (WRS) test was used. This test 
5 uses a null hypothesis that is based on ranks rather than on normally distributed data. 
Before the above tests were performed, genes that were absent (AC) across all samples 
were removed; 5,483 genes were used in the analyses. Genes that were significant (P < 
.0001) for both the x 2 test and the WRS test were considered to be significantly 
differentially expressed. 

1 0 Clinical parameters were tested across multiple myeloma cluster groups. 

ANOVA test was used to test the continuous variables, and x 2 test of independence or 
Fisher's exact test was applied to test discrete variables. The natural log of the average 
difference data was used to find genes with a "spiked profile" of expression in multiple 
myeloma. Genes were identified that had low to undetectable expression in the majority 

15 of patients and normal samples (no more than 4 present absolute calls [P-AC]). A total 
of 2,030 genes fit the criteria of this analysis. The median expression value of each of the 
genes across all patient samples was determined. For the I th gene, this value was called 
medgene (i). The i 01 gene was a "spiked" gene if it had at least 4 patient expression values 
>2.5 + medgene (i). The constant 2.5 was based on the log of the average difference data. 

20 These genes that were "spiked" were further divided into subsets according to whether or 
not the largest spike had an average difference expression value greater than 10,000. 

To determine transcriptional changes associated with human plasma cell 
differentiation, a total of 4866 genes were scanned across 7 cases each of tonsil B cells, 
tonsil plasma cells, and bone marrow plasma cells. The 4866 genes were derived from 

25 6800 by filtering out all control genes, and genes not fulfilling the test of Max-Min <1.5 
(1.5 being the natural log of the average difference). The x 2 test was used to eliminate 
genes with absent absolute call (AAC). For example, in the tonsil plasma cell to bone 
marrow plasma cell comparison, genes with x 2 values greater than 3.84 (P O.05) or 
having "present" AC (PAC) in more than half of the samples in each group were retained. 

30 In the tonsil B cell to tonsil plasma cell and tonsil plasma cell to bone marrow plasma cell 
comparisons, 2662 and 2549 genes were retained as discriminating between the two 
groups, respectively. To compare gene expression levels, the non-parametric Wilcoxon 

23 



WO 03/053215 



PCT/US02/35724 



Rank Sum (WRS) test was used to compare two groups using natural log transformed 
AD. The cutoff P value depended on the sample size, the heterogeneity of the two 
comparative populations (tonsil B cells, tonsil plasma cells, and bone, marrow plasma 
cells showed a higher degree of stability in AD), and the degree of significance. In this 
5 analysis, 496 and 646 genes were found to be significantly (P <0.0005) differentially 
expressed in the tonsil B cell versus tonsil plasma cell and tonsil plasma cell versus bone 
marrow plasma cell comparisons, respectively. To define the direction of significance 
(expression changes being up or down in one group compared with the other), the non- 
parametric Spearman correlation test of the AD was employed. 

1 0 Genes that were significantly differentially expressed in the tonsil B cell 

to tonsil plasma cell transition were referred as "early differentiation genes" (EDGs) and 
those differentially expressed in the tonsil plasma cell to bone marrow plasma cell 
transition were referred as "late differentiation genes" (LDGs). Previously defined and 
novel genes were identified that significantly discriminated tonsil B cells from tonsil 

1 5 plasma cells (359 genes) and tonsil plasma cells from bone marrow plasma cells (500 

genes). ...... 

To classify multiple myeloma with respect to EDG and LDG, 74 newly 
diagnosed cases of multiple myeloma and 7 tonsil B cell, 7 tonsil plasma cell, and 7 bone 
marrow plasma cell samples were tested for variance across the 359 EDGs- and 500 

20 LDGs. The top 50 EDGs that showed the most significant variance across all samples 
were defined as early differentiation genes for myeloma (EDG-MM). Likewise, the top 
50 LDGs showing the most significant variance across all samples were identified as late 
differentiation genes for myeloma- 1 (LDG-MM1). Subtracting the LDG-MM1 from the 
500 LDG and then applying one-way ANOVA test for variance to the remaining genes 

25 identified the top 50 genes showing similarities between bone marrow plasma cells and 
multiple myeloma. These genes were defined as LDG-MM2. 

Hierarchical clustering was applied to all samples using 30 of the 50 EDG- 
MM. A total of 20 genes were filtered out with Max-Min < 2.5. This filtering was 
performed on this group because many of the top 50 EDG-MM showed no variability 

3 0 across multiple myeloma and thus could not be used to distinguish multiple myeloma 
subgroups. A similar clustering strategy was employed to cluster the samples using the 
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50 LDG-MM1 and 50 LDG-MM2; however, in these cases all 50 significant genes were 
used in the cluster analysis. 

EXAMPLE4 

5 RT-PCR And Immunohistochemistrv 

RT-PCR for the FGFR3 MMSET was performed on the same cDNAs 
used in the microarray analysis. Briefly, cDNA was mixed with the IGJH2 (5*- 
CAATGGTCACCGTCTCTTCA-3\ SEQ ID No. 1) primer and the MMSET primer 
(5 ' -CCTC A ATTTCCTG AA ATTGGTT-3 ' , SEQ ID No. 2). PCR reactions consisted 
10 of 30 cycles with a 58° C annealing temperature and 1 -minute extension time at 72° C 
using a Perkin-Elmer GeneAmp 2400 thermocycler (Wellesley, MA). PCR products 
were visualized by ethidium bromide staining after agarose gel electrophoresis. 

Immunohistochemical staining was performed on a Ventana ES (Ventana 
Medical Systems, Tucson, AZ) using Zenker-fixed paraffin-embedded bone marrow 
1 5 sections, an avidin-biotin peroxidase complex technique (Ventana Medical Systems), and 
the antibody L26 (CD20, Ventana Medical Systems). Heat-induced epitope retrieval was 
performed by microwaving the sections for 28 minutes in a 1.0-mmol/L concentration of 
citrate buffer at pH 6.0. 

20 EXAMPLE 5 

Interphase FISH 

For interphase detection of the t(ll;14)(ql3;q32) translocation fusion 
signal, a LSI IGH/CCND1 dual-color, dual-fusion translocation probe was used (Vysis, 
Inc, Downers Grove, IL). The TRI-FISH procedure used to analyze the samples has 
25 been previously described. Briefly, at least 100 clonotypic plasma cells identified by clg 
staining were counted for the presence or absence of the translocation fusion signal in all 
samples except one, which yielded only 35 plasma cells. An multiple myeloma sample 
was defined as having the translocation when >25% of the cells contained the fusion. 
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EXAMPLE 6 

Hierarchical Clustering of Plas ma Cell Gene Expression Demonstrates Class Distinction 

As a result of 656,000 measurements of gene expression in 118 plasma 
cell samples, altered gene expression in the multiple myeloma samples was identified. 
5 Two-dimensional hierarchical clustering differentiated cell types by gene expression 
when performed on 5,483 genes that were expressed in at least one of the 118 samples 
(Figure 1A). The sample dendrogram derived two major branches (Figure 1A and ID). 
One branch contained all 31 normal samples and a single monoclonal gammopathy of 
undetermined significance case whereas the second branch contained all 74 multiple 

1 0 myeloma and 4 monoclonal gammopathy of undetermined significarice cases and the 8 
cell lines. The multiple myeloma-containing branch was further divided into two sub- 
branches, one containing the 4 monoclonal gammopathy of undetermined significance and 
the other the 8 multiple myeloma cell lines. The cell lines were all clustered next to one 
another, thus showing a high degree of similarity in gene expression among the cell lines. 

1 5 This suggested that multiple myeloma could be differentiated from normal plasma cells 
and that at least two different classes of multiple myeloma could be identified, one more 
similar to monoclonal gammopathy of undetermined significance and the other similar to 
multiple myeloma cell lines. 

Hierarchical clustering analysis with all 118 samples together with 

2 0 duplicate samples from 1 2 patients (plasma cells taken 24 hr or 48 hr after initial sample) 

were repeated to show reproducibility of the technique and analysis. All samples from 
the 12 patients studied longitudinally were found to cluster adjacent to one another. This 
indicated that gene expression in samples from the same patient were more similar to 
each other than they were to all other samples (data not shown). 
25 In addition to the demonstration of reproducibility of clustering noted 

above, three microarray analyses were also performed on a single source of RNA from 
one patient. When included in the cluster analysis, the three samples clustered adjacent 
to one another. Consistent with the manufacturer's specification, an analysis of the fold 
changes seen in the samples showed that <2% of all genes had a >2-fold difference. 

3 0 Hence, these data indicated reproducibility for same samples. 

The clustergram (Figure 1 A) showed that genes of unrelated sequence but 
similar function clustered tightly together along the vertical axis. For example, a 



26 



WO 03/053215 



PCT/US02/35724 



particular cluster of 22 genes, primarily those encoding immunoglobulin molecules and 
major histocompatibility genes, had relatively low expression in multiple myeloma 
plasma cells and high expression in normal plasma cells (Figure IB). This was 
anticipated, given that the plasma cells isolated from multiple myeloma are clonal and 
5 hence only express single immunoglobulin light-chain and heavy-chain variable and 
constant region genes, whereas plasma cells from normal donors are polyclonal and 
express many different genes of these two classes. Another cluster of 195 genes was 
highly enriched for numerous oncogenes/growth-related genes (e.g., MYC, ABU, PHB, 
and EXT2\ cell cycle-related genes (e.g., CDC37, CDK4, and CKS2), and translation 
1 0 machinery genes (EIF2, EIF3, HTF4A, and TFIIA) (Figure 1C). These genes were all 
highly expressed in MM, especially in multiple myeloma cell lines, but had low 
expression levels in normal plasma cells. 

EXAMPLE 7 

15 Hierarchical Clustering of Newlv Diagnosed Multiple Mveloma Identifies Four Distinct 
Subgroups 

Two-dimensional cluster analysis was performed on the 74 multiple 
myeloma cases alone. The sample dendrogram identified two major branches with two 
distinct subgroups within each branch (Figure IE). The four subgroups were designated 

20 MM1, MM2, MM3, and MM4 containing 20, 21, 15, and 18 patients respectively. 
The MM1 subgroup represented the patients whose plasma cells were most closely 
related to the monoclonal gammopathy of undetermined significance plasma cells and 
MM4 were most like the multiple myeloma cell lines (see Figure ID). These data 
suggested that the four gene expression subgroups were authentic and might represent 

2 5 four distinct clinical entities. 

Differences in gene expression across the four subgroups were then 
examined using the x 2 and WRS tests (Table 1). As expected the largest difference was 
between MM1 and MM4 (205 genes) and the smallest difference was between MM1 
and MM2 (24 genes). Next, the top 30 genes turned on or upregulated in MM4 as 

30 compared with MM1 were examined (Table 2). The data demonstrated that 13 of 30 
most significant genes (10 of the top 15 genes) were involved in DNA replication/repair 
or cell cycle control. Thymidylate synthase (TYMS), which was present in all 18 
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samples comprising the MM4 subgroup, was only present in 3 of the 20 MM1 samples 
and represented the most significant gene in the x 2 test. The DNA mismatch repair gene, 
mutS (£ coli) homolog 2 (MSH2) with a WRS P value of 2.8 x KT 6 was the most 
significant gene in the WRS test. Other notable genes in the list included the CAAX 
5 farnesyltransferase (FNTA), the transcription factors enhancer of zeste homolog 2 
(EZH2) and MFC-associated zinc finger protein (MAT), eukaryotic translation initiation 
factors (EIF2S1 and EIF2B1), as well as the mitochondrial translation initiation factor 2 
(MTIF2), the chaperone (CCT4) t the UDP-glucose pyrophosphorylase 2 (IUGP2), and 
the 26S proteasome-associated padl homolog (POH1). 

10 To assess the validity of the clusters with respect to clinical features, 

correlations of various clinical parameters across the 4 subgroups were analyzed (Table 
3). Of 17 clinical variables tested, the presence of an abnormal karyotype (P = .0003) 
and serum P2M levels (P = .0005) were significantly different among the four subgroups 
and increased creatinine (P = .06) and cytogenetic deletion of chromosome 13 (P = .09) 

1 5 were marginally significant. The trend was to have higher p2M and creatinine as well as 
an abnormal karyotype and chromosome 13 deletion in the MM4 subgroup as compared 
with the other 3 subgroups. 

TABLE 1 

20 

Differences In Gene Expression Among Multiple Myeloma Subgroups 



Comparison 


Range of WRS* P Values 


Number of Genes 


MM1 vs MM4 


.00097 to 9.58x10 


205 


MM2 vs MM4 


.00095 to 1.0410 "* 


162 


MM3 vs MM4 


.00098 to 3.7510 "* 


119 


MM1 vs MM3 


.00091 to 6.2710 " 6 


68 


MM2 vs MM3 


.00097 to 1.9810 " 5 


44 


MM1 vs MM2 


.00083 to 2.9310 " 5 


24 



♦Wilcoxon rank sum test. Comparisons are ordered based on the number of significant 
3 0 genes. Comparisons have a WRS P value < 0.00 1 . 
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TABLE 2 

The 30 Most Differentially Expressed Genes In A Comparison Of MM1 And MM4 

Subgroups 

Accession Function Gene MM1 MM4 Chi WRS* 

5 Symbol (N=20)(N=18) Square P Value 







P 


P 






D00596 DNA replication 


TYMS 


3 


18 


24.35 


1.26X10" 4 


U35835 DNA repair 


PRKDC 


2 


17 


23.75 


4.55X10" 6 


U77949 DNA replication 


CDC6 


1 


13 


15.62 


5.14x10-* 


U91985 DNA fragmentation 


DFFA 


1 


12 


13.38 


6.26xl0' 5 


U61145 transcription 


EZHl 


4 


15 


12.77 


1.67x10"^ 


U20979 DNA replication 


CHAFIA 


2 


12 


10.75 


1. 10x10"* 


U03911 DNA repair 


MSH2 


0 


9 


10.48 


2.88X10" 6 


X74330 DNA replication 


PRIM1 


0 


9 


10.48 


9.36x1 Or 6 


X12517 SnRNP 


SNRPC 


0 


9 


10.48 


5.26x1 0 -6 


D85131 transcription 


MAZ 


0 


9 


10.48 


1.08xl0' 5 


L00634 farnesyltransferase 


FNTA 


10 


18 


9.77 


7.28x1 0 -5 


U21090 DNA replication 


POLD2 


11 


18 


8.27 


8.05x1 0" s 


X54941 cell cycle 


CKSI 


10 


17 


7.07 


1. 26x10^ 


U62136 cell cycle 


UBE2V2 


13. 


.18 


.5.57 . 


4.96X10" 6 


D38076 cell cycle 


RANBP1 


13 


18 


5.57 


7.34X10" 6 


X95592 unknown 


ClD t 


13 


18 


5.57 


l.lOxlO- 4 


X66899 cell cycle 


EWSR1 


14 


18 


4.35 


1.89X10" 4 


L34600 translation initiation 


MT1F2 


14 


18 


4.35 


3.09xl0' 5 


U27460 Metabolism 


IUGP2 


15 


18 


3.22 


1.65X10" 4 


U 15009 SnRNP 


SNRPD3 


15 


18 


3.22 


1.47xl0 -5 


J02645 translation initiation 


EIF2S1 


16 


18 


2.18 


7.29x1 0" 5 


X95648 translation initiation 


EIF2B1 


16 


18 


2.18 


1.45X10" 4 


M34539 calcium signaling 


FKBP1A 


18 


18 


0.42 


1.71xl0 -5 


J04611 DNA repair 


G22P1 


18 


18 


0.42 


7.29X10"" 5 


TJ67122 anti-apoptosis 


UBL1 


20 


18 


0.00 


7.29x1 0" 5 


U38846 chaperon 


CCT4 


20 


18 


0.00 


1.26X10" 4 
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U80040 metabolism 


AC02 


20 


18 


0.00 


8.38x1 0" 5 


U86782 proteasome 


POH^ 


20 


18 


0.00 


5.90xl0' 5 


X57152 signaling 


CSNK2B 


20 


18 


0.00 


7.29xl0 -5 


D87446 unknown 


KIAA0257* 


20 


18 


0.00 


1.26xl0' 5 



5 • Accession numbers listed are GeneBank numbers. * Gene symbol not HUGO 
approved. J Wilcoxon rank sum test. 

TABLE 3 

Clinical Parameters Linked To Multiple Myeloma Subgroups 

10 

Multiple Myeloma Subgroups 

Clinical Parameter 1 2 3 4 P value 

Abnormal cytogenetics 40.0% 5.3% 53.3% 72.2% .00028 

Average B2-microglobulin fmg/L) 2.81 2.73 4.62 8.81 .00047 
1 5 ANOVA, Chi square, and Fisher's exact tests were used to determine significance. 

EXAMPLE 8 

Altered Expression Of 120 Genes In Malignant Plasma Cells 

20 Hierarchical cluster analysis disclosed above showed that multiple 

myeloma plasma cells could be differentiated from normal plasma cells. Genes 
distinguishing the multiple myeloma from normal plasma cells were identified as 
significant by x 2 analysis and the WRS test (P < .0001). A statistical analysis showed 
that 120 genes distinguished multiple myeloma from normal plasma cells. Pearson 

25 correlation analyses of the 120 differentially expressed genes were used to identify 
whether the genes were upregulated or downregulated in MM. 

When genes associated with immune function (e.g. IGH, IGL y HLA) that 
represent the majority of significantly downregulated genes were filtered out, 50 genes 
showed significant downregulation in multiple myeloma (Table 4). The P values for the 

30 WRS test ranged from 9.80 x 10~ 5 to 1.56 x 10" 14 , and the x 2 test of the absence or 
presence of the expression of the gene in the groups ranged from 18.83 to 48.45. The 
gene representing the most significant difference in the x 2 test was the CXC chemokine 

30 
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SDFL It is important to note that a comparison of multiple myeloma plasma cells to 
tonsil-derived plasma cells showed that, like multiple myeloma plasma cells, tonsil 
plasma cells also do not express SDFL Two additional CXC chemokines, PF4 and 
PF4V1, were also absent in multiple myeloma plasma cells. The second most significant 
5 gene was the tumor necrosis factor receptor super family member TNFRF7 coding for 
CD27, a molecule that has been linked to controlling maturation and apoptosis of plasma 
cells. 

The largest group of genes, 20 of 50, were linked to signaling cascades, 
multiple myeloma plasma cells have reduced or no expression of genes associated with 

1 0 calcium signaling (S100A9 and S100A12) or lipoprotein signaling (LIPA, LCN2, PLA2G7, 
APOE, APOCJ). LCN2, also known as 24p3, codes for secreted lipocalin, which has 
recently been shown to induce apoptosis in pro B-cells after growth factor deprivation. 
Another major class absent in multiple myeloma plasma cells was adhesion-associated 
genes (ITGA2B, IGTB2, GP5, VCAM, and MIC2\ 

15 Correlation analysis showed that 70 genes were either turned on or 

upregulated in multiple myeloma (Table 5). When considering the % 2 test of whether 
expression is present or absent, the cyclin-dependent inhibitor, CDKNJA, was the most 
significantly differentially expressed gene (x 2 = 53.33, WRS = 3.65 x 10 _u ). When 
considering a quantitative change using the WRS test, the tyrosine kinase oncogene ABL1 

20 was the most significant (x 2 = 43.10, WRS = 3.96 xlO" 14 ). Other oncogenes in the list 
included USF2, USP4, MLLT3 and MYC The largest class of genes represented those 
whose products are involved in protein metabolism (12 genes), including amino acid 
synthesis, translation initiation, protein folding, glycosylation, trafficking, and protein 
degradation. Other multiple-member classes included transcription (11 genes), signaling 

25 (9 genes), DNA synthesis and modification (6 genes), and histone synthesis and 
modification (5 genes). 

Overexpression of signaling genes such as QSCN6, PHB } phosphatases 
PTPRK and PPP2R4, and the kinase MAPKAPK3 has been linked to growth arrest. The 
only secreted growth factor in the signaling class was HGF 9 a factor known to play a role 

30 in multiple myeloma biology. The MOX2 gene, whose product is normally expressed as 
an integral membrane protein on activated T cells and CD19 + B cells and involved in 
inhibiting macrophage activation, was in the signaling class. The tumor suppressor gene 
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and negative regulator of p-catenin signaling, APC, was another member of the signaling 
class. Classes containing two members included RNA binding, mitochondrial respiration, 
cytoskeletal matrix, metabolism, cell cycle, and adhesion. Single member classes included 
complement casasde (MASPJ), drug resistance (MVP\ glycosaminoglycan catabolism, 
5 heparin sulfate synthesis (EXTL2), and vesicular transport (TSC1). Four genes of 
unknown function were also identified as significantly upregulated in MM. 
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X59871 


transcription 


TCF7 


26.79 


7.16xl0" 10 


X67235 


transcription 


HHEX 


25.21 


2.07x10-'° 


U19713 


calcium signaling 


A1F1 


25.21 


2.57x10-'° 


Y08136 


apoptosis 


ASM3A + 


24.74 


3.30X10" 8 


M97676 


transcription 


MSX1 


24.58 


9.80xl0" 5 


M64590 


house keeping 


GLDC 


24.27 


4.10xl0" 8 


M20203 


protease 


ELA2 


24.03 


6.36x1 0" 12 


M30257 


adhesion 


VCAM1 


23.42 


1.71x10-'° 


M93221 


mediates endocytosis MRCl 


. 23.30. 


U5xl0" 7 


S75256 


lipoprotein signaling 


LCN2 


23.30 


4.1 7x1 0" 7 


U97188 


RNA binding 


KOCI^ 


22.47 


5.86x1 0" 9 


Z23091 


adhesion 


GPS 


22.47 


7.58xl0" 7 


M34344 


adhesion 


ITGA2B 


21.99 


8.00x10"* 


M25897 


cxc chemokine 


PF4 


21.89 


1.12X10 -8 


M31994 


house keeping 


ALDH1A1 


21.36 


4.86x10-* 


Z31690 


lipoprotein signaling 


LIPA 


20.67 


1.50xl0" 9 


S80267 


signaling 


SYK 


20.42 


5.90x1 0" s 


U00921 


signaling 


LY117 


18.83 


1.57X10" 8 



* Accession numbers listed are GeneBank numbers, except those beginning with 
2 0 "HT'\ are provided by the Institute of Genomic Research (TIGR). t Gene symbol 
not HUGO approved. { Wilcoxon rank sum test. 
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TABLE 5 

The 70 Most Significantly Upregulated Genes in Multiple Myeloma in Comparison 
with Normal Bone Marrow Plasma Cells 



15 



Accession* 


Function 


Gene 
Symbol * 


Chi 
Square 


WRS * 
P Value 


U09579 


cell cycle 


CDKN1A 


53.33 


3.65x10"" 


U78525 


amino acid synthesis 


EIF3S9 


49.99 


2.25x1 0"' 2 


HT5158 


DNA synthesis 


GMPS 


47.11 


4.30xl0" 12 


X57129 


histone 


H1F2 


46.59 


5.78xlO- B 


M55210 


adhesion 


LAMC1 


45.63 


1.34X10" 9 


L77886 


signaling, phosphatase 


PTPJIK . 


. 45.62 


. 5.42x10;'° 


U73167 


glycosaminoglycan 


HYAL3 


44.78 


1.07x10-'° 




catabolism 








X16416 


oncogene, kinase 


ABU 


43.10 


3.96xl(T 14 


U57316 


transcription 


GCN5L2 


43.04 


1.36X10" 12 


Y09022 


protein glycosylation 


NOT56L 1 


42.05 


5.53x10-'° 


M25077 


RNA binding 


SSA2 


41.26 


1.69xir 7 


AC002U5 


mitochondrial respiration 


COX6B 


41.16 


2.16x10-" 


Y07707 


transcription 


NRF* 


37.59 


4.79x10"'° 


L22005 


protein ubiquination 


CDC34 


34.50 


2.89x1 0" 6 


X66899 


transcription 


EWSR1 


34.39 


4.23x1 0" 8 


D50912 


RNA binding 


RBM10 


33.93 


2.61x10"* 


HT4824 


amino acid synthesis 


CBS 


33.77 


1.49x10"* 


U 10324 


transcription 


ILF3 


33.33 


• 3.66x10--" 


AD000684 


oncogene, transcription 


USF2 


32.18 


7.41x10-" 


U68723 


cell cycle 


CHES1 


31.68 


1.03x10* 


XI 6323 


signaling, growth factor 


HGF 


30.67 


4.821 0"* 


U24183 


metabolism 


PFKM 


30.47 


8.92x10''° 


D13645 


unknown 


KJAA0020 1 


30.47 


7.40X10- 6 


S85655 


signaling, growth arrest 


PHB 


29.37 


1.32X10" 8 


X73478 


signaling, phosphatase 


PPP2R4 


28.32 


6.92x1 0" 9 
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L77701 


mitochondrial respiration 


COX J 7 




U20657 


oncogene, proteasome 


USP4 




M59916 


signaling, DAG signaling 


SMPD1 




D16688 


oncogene, DNA binding 


MLLT3 


5 


X90392 


DNA endonuclease 


DNASE1L1 




U07424 


amino acid synthesis 


FARSL 




X54199 


DNA synthesis 


GART 




L06175 


unknown 


P5-J 1 




M55267 


unknown 


EVI2A 


10 


M87507 


protein degradation 


CASP1 




M90356 


transcription 


BTF3L2 




U35637 


cytoskeletal matrix 


NEB 




L06845 


amino acid synthesis 


CARS 




U81001 


DNA, nuclear matrix 


SNURF 


15 




attachment 






U76189 


heparan sulfate synthesis 


EXTL2 




U53225 


protein trafficking 


SNXI 




X04366 


protein degradation 


CAPN1 




U77456 


protein folding 


NAP1L4 


20 


L42379 


signaling, growth arrest 


QSCN6 




U09578 


signaling, kinase 


MAPKAPK3 




Z80780 


histone 


H2BFH 




HT4899 


oncogene, transcription 


MYC 




M74088 


signaling, b-catenin regulator APC 


25 


X57985 


histone 


H2BFQ 




X79882 


drug resistance 


MVP 




X77383 


protein degradation 


CTSO 




M91592 


transcription 


ZNF76 




X63692 


DNA methyltransferase 


DNMT1 


30 


M60752 


histone 


H2AFO 




M96684 


transcription 


PURA 




U 16660 


metabolism 


ECH1 
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27.81 


1. 33x10-* 


27.71 


2.31x10"* 


27.49 


3.52x10"* 


27.24 


' 6.97xl0^ 13 


26.98 


4.72x1 0" 7 


26.93 


1.66x10"* 


26.57 


9.61x10-" 


26.57 


5.16xl0" 7 


25.92 


3.79x10-* 


25.78 


5.46xl0' 7 


25.78 


9.68x10-* 


25.40 


9.15x10"* 


25.34 


5.39x10"* 


24.58 


4.54xl0" 5 


24.58 


7.28x10"* 


24.48 


5.53xl0 1? 


24.35 


1.26X10" 9 


24.27 


4.23xl0" 10 


24.27 


1.28x10"'° 


24.27 


2.35X10" 9 


24.27 


3.44X10" 12 


24.27 


1.77xl0" 5 


23.94 


1.50xl0" s 


23.90 


3.25x1 0" 12 


23.47 


1.77x10"" 


23.18 


4.68xl0" 7 


23.16 


1.12x10"* 


23.12 


5.15x10"" 


21.60 


1.46x10"* 


21.25 


4.54xl0" 5 


21.18 


5.52xl0" s 
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M86737 


DNA repair 


SSRP1 


20.60 


2.62xl0~ 5 




U35113 


histone deacetylase 


MTA1 


20.60 


6.67x10-'° 




X81788 


unknown 


ICT1 


20.42 


2.97xl0 -7 




HT2217 


signaling 


MUC2A 


20.33 


2.61xl0" 7 


5 


M62324 


unknown 


MRF-1 * 


20.31 


3.98X10" 9 




U09367 


transcription 


ZNF136 


20.30 


7.72X10" 9 




X89985 


cytoskeletal matrix 


BCL7B. 


19.81 


. 5.50x10^ 




L19871 


transcription repression 


ATF3 


19.43 


1.13x10* 




X69398 


adhesion 


CD47 


19.16 


6.44xl0" 7 


10 


X05323 


signaling 


MOX2 


19.16 


8.58X10 -6 






macrophage inhibitor 










X04741 


protein ubiquination 


UCHL1 


19.14 


9.76xl0" 5 




D87683 


vesicular transport 


TSCI 


19.12 


6.81x10-* 




D 17525 


complement cascade 


MASP1 


18.81 


4.05xl0" 7 



15 * Accession numbers listed are GeneBank numbers, except those beginning with 
"HT", which are provided by the Institute of Genomic Research (TIGR). 1 Gene 
symbol not HUGO approved. * Wilcoxon rank sum test. 
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EXAMPLE 9 

Altered Expression Of 14 Genes Differentiates Malignant From Normal Plasma Cells 
The present invention also sought to determine whether expression 
5 patterns of a minimum number of genes could be used to clearly differentiate normal, 
pre-neoplastic and malignant plasma cells. A multivariate step-wise discriminant 
analysis (MSDA) was applied to the top 50 significantly differentially expressed 
genes across the normal plasma cells (N = 26) and multiple myeloma plasma cells (N 
=162) and a linear discriminant function between the normal plasma cell group and 

10 multiple myeloma group was observed. Both forward and backward variable 
selections were performed. The choice to enter or remove variables was based on a 
Wilks' X analysis, defined as follows: X(x) = det W(x)/ det T(x) where W(x) and T(x) 
are the within-group and total scatter matrices respectively. Wilks' X can assume 
values ranging from 0 to 1. The significance of change in X was tested using the F 

1 5 statistic. At the end of multivariate step-wise discriminant analysis, only 14 genes 
were selected to compute the canonical discriminant functions (Table 6). The 
multivariate step-wise discriminant analysis selected the following equation: 
Discriminant score = HG4716 x 3.683 - L33930 x 3.134 + L42379 x 1.284 + L77886 x 
1.792 + M14636 x 5.971 - M26167 x 6.834 + U10324 x 2.861 - U24577 x 10.909 + 

20 U35112x 2.309 +X16416 x 6.671- X64072 x 5.143+ 79822 x 5.53 + Z22970 x 
4.147+ Z80780 x 2.64 - 87.795. The cutoff value was - 3.3525. Values less than - 
3.3525 indicated the sample belonged to the normal group and values greater than - 
3.3525 indicated the sample belonged to the multiple myeloma group." 

The 14 gene model was then applied to a training group consisting of 

25 162 multiple myeloma and 26 normal plasma cell (data not shown). A cross- 
validation analysis was performed where samples were removed one at a time from 
the sample set, and training statistics and expression means for each class of the 
modified sample set were re-calculated. A predictive value using genes with a P value 
< 0.05 in the modified sample set was generated. A 100% accurate prediction of the 

3 0 sample types in the training group was obtained. 
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A validation group was then applied to the model. The multivariate 
step-wise discriminant analysis correctly classified 116 of 118 (98.31%) primary 
multiple myeloma samples and 8 of 8 (100%) of human myeloma cell lines as multiple 
myeloma. In addition, 6 of 6 normal plasma plasma cell samples were classified as 
5 normal. Importantly, the model predicted that 6 of 7 monoclonal gammopathy of 
undetermined significance cases were multiple myeloma with 1 monoclonal 
gammopathy of undetermined significance case predicted to be normal (Figure 5). The 
classification of the 6 monoclonal gammopathy of undetermined significance cases as 
multiple myeloma has important ramifications in that it suggests that cells in this 

1 0 benign condition have strong similarities to fully transformed cells. These results also 
have important implications in the eitology of monoclonal gammopathy of 
undetermined significance and its transition to overt multiple myeloma. The fact that 
the model classified monoclonal gammopathy of undetermined significance as multiple 
myeloma is consistent with recent studies that have shown monoclonal gammopathy 

1 5 of undetermined significance has chromosomal abnormalities e.g. translocations of the 
IGH locus and deletion of chromosome 1 3 that are also common in multiple myeloma. 
Future studies will be aimed at identification of gene expression patterns that can 
actually distinguish monoclonal gammopathy of undetermined significance from 
multiple myeloma. With the majority of the monoclonal gammopathy of 

20 undetermined significance cases being classified as multiple myeloma, the 
classification of a 1 monoclonal gammopathy of undetermined significance cases as 
normal may indicate 1) the patient does not have monoclonal gammopathy of 
undetermined significance or 2) the monoclonal gammopathy of undetermined 
significance cells represented a minority of the plasma cells in the sample. The 

25 monoclonal gammopathy of undetermined significance case and the 2 multiple 
myeloma cases classified as normal will be followed longitudinally to determine 
whether in the future the samples will shift to the multipHe myeloma group. 

In order to further validate the discriminant results, two-dimensional 
hierarchical clustering was performed on 927 genes with expression in at least one 

30 sample. The 118 multiple myeloma samples from the validation group, 32 normal 
plasma cells, 7 multiple myeloma cell lines, and 7 monoclonal gammopathy of 
undetermined significance were studied. Along the horizontal axis, experimental 
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samples were arranged such that those with the most similar patterns of expression 
across all genes were placed adjacent to each other. Surprisingly, the two 
misclassified multiple myelomas and one monoclonal gammopathy of undetermined 
significance classified as normal plasma samples by discriminant analysis were also 
5 connected to the normal group in the cluster analysis (Figure 6). This result indicated 
that the 14 gene discriminant model was consistent with a 927 gene hierarchical cluster 
model. 

A survey of the function of the 14 genes in the above analysis showed 
several interesting features. The genes are not related in function and thus represent 

1 0 unique and independent genetic markers that can clearly be used as signatures of 
normal and malignant cells. Genes are associated with the microenvironment (ITGB2), 
cell transformation (ABL1) and drug resistance (MVP). It is possible that the 
deregulated expression of these genes may represent fundamental genetic 
abnormalities in the malignant transformation of plasma cells. For example, the 

15 ITGB2 gene encodes the glycoprotein P-2 integrin (CD 18) which is critical to the 
formation of integrin heterodimers known to mediate cell-cell and/or cell-matrix 
adhesion events. As plasma cells constitutively express ICAM-1 and this molecule 
can be induced on bone marrow adherent cells, one can envisage a mechanism in which 
the ITGB2/ICAM-1 adhesion pathway mediates adhesion among plasma cells as well 

20 as with cells in the bone marrow microenvironment. In human lymphomas, ITGB2 
expression is found on tumor cells in low- and medium-grade malignant lymphomas, 
whereas absence of ITGB2 seems to be a characteristic of high-grade malignant 
lymphomas. Similar to other B lymphoma, the absence of ITGB2 might contribute to 
an escape from immunosurveillance in multiple myeloma. 

25 In summary, the present invention describes a model that makes it 

possible to diagnosis multiple myeloma by the use of the differential expression of 14 
genes. It is currently not clear whether deregulated expressions of these genes are 
involved in the creation of the malignant phenotype or whether they represent 
sentinels of some underlying yet to be recognized genetic defect(s). However, the 

30 functions of these genes suggest an underlying causal relationship between the 
deregulated expression and malignancy. 
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TABLE 6 

Fourteen Gene Defining the Optimal Diagnosis Model 



Accession* 


Gene Symbol 


Wilks' Lambda 


F to Remove 


/•number 


HT5158 


GMPS 


0.090 


10.99 


0.0011 


L33930 


CD24 


0.089 


8.80 


0.0034 


L42379 


QSCN6 


0.087 


4.24 


0.0409 


L77886 


PTPRK 


0.088 


6.46 


0.01 19 


Ml 4636 


PYGL 


0.091 


12.62 


0.0005 


M26167 


PF4V1 


0.091 


12.39 


0.0005 


U 10324 


ILF3 


0.090 


11.98 


0.0007 


U24577 


PLA2G7 


0.107 


44.28 


3.23x10 


U35113 


MTA1 


0.088 


6.22 


0.0135 


X16416 


ABU 


0.099 


27.65 


4.04x1 0' 7 


X64072 


ITGB2 


0.097 


24.63 


1.59x10-* 


X79882 


MVP 


0.098 


25.83 


9.19xl0- 7 


Z22970 


CD163 


0.088 


6.08 


0.0146 


Z80780 


H2B 


0.092 


14.58 


0.0002 



♦Accession number listed are GeneBank numbers, except the one that begin with 



*'HT\ which is provided by the Institute of Genomic Research. 
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EXAMPLE 10 

Differential Expression of 24 Genes Can Accurately Differentiate Gene Expression 
Defined Subgroups of Multiple Myeloma 

The present invention also sought to determine whether expression 
5 patterns of a minimum number of genes could be used to clearly differentiate the gene 
expression-defined subgroups of multiple myeloma identified with hierarchical 
clustering of over 5,000 genes. As discussed above, two-dimensional cluster analysis 
of 263 multiple myeloma cases, 14 normal plasma cells, 7 MGUS and 7 multiple 
myeloma cell lines was performed. The sample dendrogram showed four subgroups 

10 of MM1, MM2, MM3 and MM4 containing 50, 75, 67, and 71 patients 
respectively. Then, the top 120 statistically significant differentially expressed genes 
as determined by Chi-square and Wilcoxon test of 31 normal plasma cells and 74 
newly diagnosed multiple myeloma were chosen for use in a canonical discriminant 
analysis. By applying a linear regression analysis 24 genes were defined as predictors 

1 5 able to differentiate the multiple myeloma subgroups (Table 7). 

The 24 genes predictor model was applied to a training group 
consisting of multiple myeloma plasma cell samples located in the center of each 
hierarchical clustering group [total N=129; MM1=23, MM2=33, MM3=34 and 
MM4=39]. A cross-validation analysis was performed on the training group where 

20 samples were removed one at a time from the sample set, and training statistics and 
expression means for each class of the modified sample set were re-calculated. A 
predictive value using genes with a P value < 0.05 in the modified sample set was 
generated. The results of this analysis showed that a 100% accurate prediction of the 
sample types in the training group was obtained. 

25 A validation group was then applied to the model. The multivariate 

step-wise discriminant analysis correctly classified 116 of 134 (86.56%) primary 
multiple myeloma samples into different subgroups as compared with the subgroups 
defined by hierarchical clustering. Importantly, 7 of 7 (100%) of human myeloma cell 
lines were classified to MM4 as expected. In addition, the model predicted that 5 of 7 

30 MGUS cases were MM1, and the remaining cases were predicted to be MM2 and 
MM3 respectively (Figure 7). 
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TABLE 7 

Twentv-Four Genes Defining Subgroups of Multiple Mveloma ' 



Accession 


Gene 


Wilks' 


Fto 


/•value 


No.* 


Symbol 


Lambda 


Remove 




X54199 


GART 


0.004 


3.13 


0.0791 


M20902 


APOC1 


0.005 


4.05 


0.0462 


X89985 


BCL7B 


0.005 


4.47 


0.0365 


M31158 


PRKAR2B 


0.005 


5.07 


0.0260 


U44111 


HNMT 


0.005 


5.68 


0.0186 


X16416 


ABL1 


0.005 


6.72 


0.0106 


HT2811 


NEK2 


0.005 


8.35 


0.0045 


D16688 


MLLT3 


0.005 


8.36 


0.0045 


U57316 


CCN5L2 


0.005 


8.49 


0.0042 


U77456 


NAP1L4 ... 


0.005 . . 


. 8-57 


0.0040 


D13645 


KIAA00 


0.005 


9.17 


0.0030 


M64590 


GLDC 


0.005 


9.92 


0.0020 


L77701 


COX 17 


0.005 


10.01 


0.0019 


U20657 


USP4 


0.005 


11.10 


0.0011 


L06175 


P5-1 


0.005 


11.11 


0.0011 


M26311 


S100A9 


0.005 


11.20 


0.0011 


X04366 


CAPN1 


0.005 


11.67 


0.0009 


AC002115 


COX6B 


0.006 


13.64 


0.0003 


X06182 


C-K1T 


0.006 


13.72 


0.0003 


Ml 6279 


MIC2 


0.006 


16.12 


0.0001 


M97676 


MSX1 


0.006 


16.41 


0.0001 


U10324 


LIF3 


0.006 


19.66 


0.0000 


S85655 


PHB 


0.007 


. . 20.63 


. 0.0000 


X63692 


DNMT1 


0.007 


21.53 


0.0000 



♦Accession number listed are GeneBank numbers, except the one that begin with 
"HT", which is provided by the Institute of Genomic Research. 
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EXAMPLE 11 

Gene Expression "Spikes" in Subsets of Multiple Myeloma 

A total of 156 genes not identified as differently expressed in the 
5 statistical analysis of multiple myeloma versus normal plasma cells, yet highly 
overexpressed in subsets of multiple myeloma, were also identified. A total of 25 
genes with an AD spike greater than 10,000 in at least one sample are shown (Table 
8). With 27 spikes, the adhesion associated gene FBLN2 was the most frequently 
spiked. The gene for the interferon induced protein 27, IFI27, with 25 spikes was the 

1 0 second most frequently spiked gene and contained the highest number of spikes over 
10,000 (N = 14). The FGFR3 gene was spiked in 9 of the 74 cases (Figure 2A). It 
was the only gene for which all spikes were greater than 10,000 AD. In fact, the 
lowest AD value was 18,961 and the highest 62,515, which represented the highest of 
all spikes. The finding of FGFR3 spikes suggested that these spikes were induced by 

15 the multiple myeloma-specific, FG/7?5-activating t(4; 14)(p2 1 ;q32) translocation. To 
test the above hypothesis, RT-PCR for a t(4;14)(p21;q32) translocation-specific 
fusion transcript between the IGH locus and the gene MMSET was performed (data 
not shown). The translocation-specific transcript was present in all 9 FGFR3 spike 
samples but was absent in 5 samples that did not express FGFR3. These data 

2 0 suggested that the spike was caused by the t(4; 14)(p2 1 ;q32) translocation. 

The CCND1 gene was spiked with AD values greater than 10,000 in 
13 cases. TRI-FISH analysis for the t(l I;14)(ql3;q32) translocation was performed 
(Table 9). All 11 evaluable samples were positive for the t(ll;14)(ql3;q32) 
translocation by TRI-FISH; 2 samples were not analyzable due to loss of cell 

25 integrity during storage. Thus, all FGFR3 and CCND1 spikes could be accounted for 
by the presence of either the t(4;14)(p21;q32) translocation or the t(l I;14)(ql3;q32) 
translocation respectively. 

Next, the distribution of the FGFR3, CST6, IFI27, and CCND1 spikes 
within the gene expression-defined multiple myeloma subgroups was determined 

30 (Figure 2). The data showed that FGFR3 and CST6 spikes were more likely to be 
found in MM1 or MM2 (P < .005) whereas the spikes for IFI27 were associated 
with an MM3 and MM4 distribution (P < .005). CCND1 spikes were not associated 
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with any specific subgroup (P > .1). It is noteworthy that both CST6 and CCND1 
map to llql3 and had no overlap in spikes. It remains to be tested whether CST6 
overexpression is due to a variant t(l I;14)(ql3;q32) translocation. The five spikes for 
MS4A2 (CD20) were found in either the MM1 (3 spikes) or MM2 (2 spikes) 
5 subgroups (data not shown). 

The gene MS4A2 which codes for the CD20 molecule was also found 
as a spiked gene in four cases (Figure 3A). To investigate whether spiked gene 
expression correlated with protein expression, immunohistochemistry for CD20 was 
performed on biopsies from 15 of the 74 multiple myeloma samples (Figure 3B). All 

1 0 four cases that had spiked MS4A2 gene expression were also positive for CD20 
protein expression, whereas 1 1 that had no MS4A2 gene expression were also negative 
for CD20 by immunohistochemistry. To add additional validation to the gene 
expression profiling, a comparison of CD marker protein and gene expression in the 
multiple myeloma cell line CAG and the EBV-transformed lymphoblastoid cell line 

15 ARH-77 were also performed (Figure 4). The expression of CD138 and CD38 
protein and gene expression was high in CAG but absent in ARH-77 cells. On the 
other hand, expression of CD19, CD20, CD21,CD22, CD45, and CDw52 was found 
to be strong in ARH-77 and absent in CAG cells. The nearly 100% coincidence of 
FGFR3 otCCNDI spiked gene expression with the presence of the t(4;14)(pl4;q32) 

20 or t(ll;14)(ql3;q32) translocation; the strong correlation of CD20 and MS42A gene 
expression in primary multiple myeloma; and strong correlation of CD marker protein 
and gene expression in B cells and plasma cells represent important validations of the 
accuracy of the gene expression profiling disclosed herein. 

25 
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TABLE 8 

Genes with "Spiked" Expression in Plasma Cells from Newlv Diagnosed Multiple 

Myeloma Patients 



Accession 
No.* 


Function 


Gene 
Svmbol 


ff 01 

Spikes 


Spikes 
>10K 


Max Spike 


M64347 


signaling 


FGFR3 


9 


9 


62,515 


U89922 


immunity 


LTB 


4 


2 


49,261 


X67325 


interferon signaling 


IFI27 


25 


14 


47,072 


X59798 


cell cycle 


CCND1 


6 


13 


42,814 


U62800 


cysteine protease inhibitor 


CST6 


17 


6 


36,081 


U35340 


eye lens protein 


CRYBB1 


4 


1 


35,713 


XI 2530 


B-cell signaling 


MS4A2 


5 


5 


34,487 


X59766 


unknown 


AZGP1 


18 


4 


28,523 


U58096 


unknown 


TSPY 


•4 


• 1 


23,325 


U52513 


interferon signaling 


IF1T4 


5 


2 


21,078 


X76223 


vesicular trafficking 


MAL 


19 


5 


20,432 


X92689 


O-linked glycosylation 


GALNT3 


4 


1 


18,344 


D 17427 


adhesion 


DSC3 


8 


7 


17,616 


LI 1329 


signaling 


DUSP2 


14 


1 


15,962 


L13210 


adhesion, 


LGALS3BP 8 


2 


14,876 




macrophage lectin 










U10991 


unknown 


G2* 


7 


1 


14,815 


LI 0373 


integral membrane protein 


TM4SF2 


4 


2 


14,506 


U60873 


unknown 


137308 


12 


1 


12,751 


M65292 


complement regulation 


HFL1 


9 


1 


12,718 


HT4215 


phospholipid transport 


PLTP 


23 


1 


12,031 


D13168 


growth factor receptor 


ENDRB 


18 


1 


11,707 


AC002077 signaling 


GNAT1 


21 


1 


11,469 


M92934 


growth factor 


CTGF 


4 


1 


11,201 


X82494 


adhesion 


FBLN2 


27 


7 


10,648 


M30703 


growth factor 


AR 


5 


1 


10.163 



♦Accession numbers listed are GeneBank numbers, except those beginning with 
"HT", which are provided by the Institute of Genomic Research (TIGR). t Gene 



35 symbol not HUGO approved. 
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Table 9 

Correlation of CCND1 Spikes with FISH-Defined tfl l:14Kql3:q32V 



GC 


CCND1 Spike 


FISH 


Percent PCs with 


Cells 


PT* 


CAD valued 


tft 1:141 


Translocation 


Counted 


P168 


42,813 


Yes 


59% 


113 


P251 


33,042 


Yes 


80% 


124 


P91 


31,030 


Not done 


— 


— 


P99 


29,862 


Yes 


65% 


Ill 


P85 


26,737 


Yes 


92% 


124 


P241 


25,611 


Yes 


96% 


114 


P56 


23,654 


Yes 


100% 


106 


P63 


22,358 


Yes 


98% 


104 


P199 


18,761 


Yes 


60% 


35 


P107 


15,205 


Yes 


100% 


147 


P75 


14,642 


Yes 


100% 


105 


PI 87 


14,295 


Yes 


25% 


133 


P124 


10.594 


Not done 







20 *GC PT = patient identifier; f AD = average difference call. 

EXAMPLE 12 

Endothelin B Receptor As Potential Therapeutic Target of Multiple Myeloma 
25 As disclosed above, the present invention has identified a number of genes 

that have significantly different expression levels in plasma cells derived from multiple 
myeloma compared to those of normal control. Genes that are significantly up-regulated 
or down-regulated in multiple myeloma are potential therapeutic targets of multiple 
myeloma. Examples of these genes are listed in Tables 4, 5 and 8. Among these 
30 differentially expressed genes is endothelin B receptor (ENDBR). This gene was not 
expressed in normal plasma ceils, but does show highly elevated expression in a subset of 
myeloma. In fact, this gene now appears to be highly expressed in between 30-40% of 
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myeloma patients. Figure 8 shows a comparison of ENDBR expression in normal plasma 
cells and in approximately 200 myeloma patients starting with PI through P226. 
ENDBR was either off or highly expressed in multiple myeloma patients (Figure 8A). 
Levels of ENDBR expression levels were approximately the same in newly diagnosed and 
5 previously treated patients, suggesting that the activation is not a progression event 
(Figure 8B). 

Several important features of ENDBR should be noted. The ENDBR gene 
is located on chromosome 13. This is of potential significance given that abnormalities in 
chromosome 13 such as translocation or deletions represent one of the most powerful 

1 0 negative risk factors in multiple myeloma. Thus, it is possible that the hyperactivation 
of ENDBR expression could be an indicator of poor prognosis for multiple myeloma. 
There are also extensive reports linking endothelin signaling to cell growth, and 
endothelins have been shown to activate several key molecules with documented 
pathological roles in plasma cell tumorigenesis. Of note are the c-MYC oncogene, a gene 

1 5 that is activated in 100% of mouse plasmacytomas and hyperactivated in many primary 
human myeloma cells, and IL-6 which is a major growth and survival factor for myeloma 
cells. The endothelins also appear to exert their signaling through the phospholipase C 
pathway, a major signaling pathway in B-cells. Moreover, a recent paper reported that 
blocking endothelin signaling resulted in inhibition of the proliferation of Kaposi's 

20 sarcoma cells. 

When the tumor cells of multiple myeloma patients were taken out of the 
microenvironment of bone marrow, the tumor cells did not appear to express endothelins 
genes in a large proportion of the population. They lack expression of the endothelin 1, 
2 and 3 in most cases. However, when the myeloma cells were taken out of the bone 

25 marrow and cultured for 48-72 hours on proprietary feeder layer that mimics the bone 
marrow microenvironment, endothelin 1 gene expression was massively up-regulated in 
both the myeloma cells P323 and P322 as well as the feeder layer (Figure 9). Hence, a 
major variable within multiple myeloma may be the availability of endothelins. Enhanced 
production of endothelins coupled with up-regulated expression of ENDBR in local areas 

30 may contribute to the neoplastic phenotype of multiple myeloma, and blocking 
endothelins and endothelin receptor interaction may disrupt the development of the 
malignant phenotype. 
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EXAMPLE 13 

Comparative Gene Expression Profiling of Human Plasma Cell Differentation 

Examples 13-15 describe global gene expression profiling that reveals 
distinct changes in transcription associated with human plasma cell differentiation. 
5 Data presented below demonstrate for the first time that highly purified 

plasma cells could be isolated from two unique hematopoietic organs, tonsil and bone 
marrow. This purification of millions of cells eliminated background "noise" from non- 
specific cell types (see Figure 10), thereby allowing accurate genetic profile and 
characterization of these samples using highly sensitive gene expression profiling 

10 technology. The results disclosed herein characterized molecular transcription changes 
associated with different cell stages and especially distinguishing differences in plasma 
cell, a cell previously thought to represent an end-stage differentiation product based on 
morphological criterion. 

The CD19 + tonsil B cells and CD138 + plasma cells isolated from tonsil 

1 5 and bone marrow used in the study represent homogeneous populations with unique 
phenotypic characteristics. Thus, results presented are based on well-characterized cells 
as shown by flow cytometry, morphology, and expression of clg. These results are 
important because although great efforts have been made to understand B cell 
development, little is known about plasma cells, most likely due to their scarcity with 

20 most previous studies focusing only on flow cytometric characterizations. 

Another unique finding from the results is that B cells and plasma cells 
segregated into two branches using a hierarchical gene expression cluster analysis. 
Further, within the plasma cell branch, tonsil plasma cells could be' distinguished 1 from 
bone marrow plasma cells, indicating that the cells represent unique stages of 

25 development as suspected from their derivation from unique hematopoietic organs. 
Genes identified herein (e.g., cell surface markers and transcription factors) matched 
those previously identified as distinguishing late-stage B cell development. In addition to 
the novel genes found, previously identified genes followed expected patterns of up- and 
down-regulation and matched those genes already shown to be linked to plasma cell 

3 0 differentiation or essential transcription factors for plasma cell differentiation. 

Although cells at distinct stages of B cell development express CD 19, it is 
likely that the majority of the tonsil B cells studied here represent germinal center 
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centroblasts. It is known that centrocytes and centroblasts of germinal centers can be 
differentiated based on the expression of CD44 (centrocytes, CD44 + ; centroblasts, 
CD44~). Expression of the CD44 gene was undetectable in the tonsil B cell samples used 
in this study. In addition, the high level of expression of genes linked to proliferation, 
5 e.g. MK167, PCNA, and CCNB1 (data not shown) suggests blasts make up the largest 
population of cells among the tonsil B cells. Finally, MYBL, whose expression is a 
marker of CD38 + CD39" centroblasts, was found to be highly expressed in the tonsil B 
cells, down-regulated in tonsil plasma cells (P = 0.00068), and extinguished in bone 
marrow plasma cells. Because centroblasts have already undergone switch 

1 0 recombination, the tonsil B cells studied here represent an optimal late stage B cell 
population to use in a comparative study of gene expression changes associated with 
early plasma cell differentiation. 

A representative analysis of normal cell types used in this study is 
presented in Figure 10. FACs analysis of the tonsil preparations before sorting indicated 

1 5 that CD20 hi /CD38 l ° cells represented 70% and CD387CD20" cells represented 30% of 
the population (Figures 10a, b). After anti-CD 19 immunomagnetic bead selection, the 
CD20 hi /CD38 lo/ -cells were enriched to 98% and the CD387CD20", CD1387CD20+, and 
CD1387CD38 + fractions represented 1% of the population (Figures 10 b, c, e, f). Cell 
morphology of the purified fraction also showed that the majority of cells had typical B 

20 cell morphology (Figure lOg). Immunofluorescence microscopy with anti-kappa and 
anti-lambda antibodies indicated a slight contamination with clg + CD 19* cells (Figure 
lOh). 

Before tonsil plasma cell isolation, FACs analysis of the tonsil 
mononuclear fractions indicated that CD38 hi /CD45~ (Figure lOi) and CD138 hi /CD45~ 

25 cells (Figure lOj) represented 2.4% of the population. After anti-CD 138 
immunomagnetic bead sorting, cells with a. plasma cell phenotype that was either 
CD38 hj /CD45 10 (95%), CD138 hi /CD45 l0 (94%), CD38 h /CD20 l0 (91%), or 
CD138 hi /CD38 hi (92%) were greatly enriched (Figures 10k, 1, m, n). The tonsil CD138- 
selected cells were also found to have a typical plasma cell morphology with increased 

30 cytoplasmic to nuclear ratio of prominent perinuclear Hoff or endoplasmic reticulum 
(Figure 10 o) and >95% of the cells were clg positive (Figure lOp). 
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FACs analysis prior to anti-CD 138 immunomagnetic bead sorting of bone 
marrow mononuclear cell samples showed similar but distinct profiles in comparison 
with the tonsil preparations. CD38 hi /CD45 int and CD138 hi /CD45 int fractions showed 
more cells with lower expression of CD45 and a higher.percentage of CD138* cells in the 
5 bone marrow plasma cells (Figures lOq, r). FACS analysis after purification showed that 
the CD38 hi /CD45" and CD38 hi /CD20'° cells were enriched to 99% and 91%, respectively 
(Figures 10s, u). Differences between tonsil plasma cells and bone marrow plasma cells 
after sorting were also evident, in that whereas the tonsil plasma cells had clear evidence 
of CD38 + /CD45 + and CD38 + /CD20 + cells, these fractions were greatly reduced in the 

10 bone marrow CD138-selected cells. Bone marrow plasma cells also expressed higher 
levels of CD38 than the tonsil plasma cells (Figures 10s, k). The CD138 hi /CD45" and 
CD138 hi /CD38 hi populations represented 96% and 95% of the bone marrow plasma cell 
population (Figures lOt, v), again with a reduced amount of CD45 + cells and higher 
percentage of CD38 + cells as compared with tonsil plasma cells. As with the tonsil 

1 5 plasma cells, the majority of the bone marrow cells had plasma cell morphology (Figure 
10w) and were clg positive (Figure lOx). Thus, immunomagnetic bead selection resulted 
in the purification of a relatively homogenous tonsil B cell population and distinct 
plasma cell populations from two different organs, likely representing cells at different 
stages of maturation. 

2 0 Having demonstrated the phenotypic characteristics of the cells, the global 

mRNA expression was then analyzed in 7 tonsil B cell, 11 tonsil plasma cell, and 31 
bone marrow plasma cell samples using the Affymetrix high-density oligonucleotide 
microarray interrogating approximately 6800 named and annotated genes. The mean 
value of the AD expression level of genes for the CD markers used in the cell analysis, as 

25 well as other CD markers, chemokine receptors, apoptosis regulator, and a panel of 
transcription factors were analyzed across the normal samples (Table 10). CD45 was 
found to be highly expressed on tonsil B cells, with lower expression on tonsil plasma 
cells, and absent on bone marrow plasma cells. The genes for CD20, CD79B, CD52 t and 
CD19 showed CD45-like expression patterns with progressive down-regulation from 

30 tonsil B cells to tonsil plasma cells. Although CD21 showed no significant change from 
tonsil B cells to tonsil plasma cells, the gene was down-regulated in bone marrow plasma 
cells. CD22, CD83, and CD72 showed progressive down-regulation. 
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Consistent with the FACS analysis, Syndecan-1 (CD 138) and CD38, key 
plasma cell differentiation antigens, were absent or weakly expressed on tonsil B cells, 
with intermediate levels on tonsil plasma cells, and highest expression on bone marrow 
plasma cells. The intermediate level of CD 138 expression is likely a direct reflection of 
5 the heterogeneous mixture of CD138 + cells in the tonsil plasma cell fraction (see above) 
with some cells being highly CD138 + and others weakly positive but still able to be 
sorted based on surface expression of CD 13 8. CD38 expression shovyed the progressive 
increase seen with CD138 in the normal cells. 

It was also observed that the CD63 gene was significantly up-regulated in 

10 bone marrow plasma cells. This is the first indication that this marker may be 
differentially regulated during plasma cell differentiation. The gene for CD27 showed 
significant up-regulation from the B cell to tonsil plasma cell transition, whereas bone 
marrow plasma cells and tonsil plasma cells showed similar levels. 

Transcription factors differentially expressed in plasma cell development 

1 5 showed the expected changes. IRF4 and XBP1 were significantly up-regulated in tonsil 
and bone marrow plasma cells and CTIIA, BCL6, and STAT6 were down-regulated in the 
plasma cell samples. BSAP (PAXS) did not show the expected changes, but it is believed 
that this was due to an ineffective probe set for the gene because the BSAP target gene, 
BLK, did show the expected down-regulation in the tonsil and bone marrow plasma cells. 

20 Interestingly, whereas MYC showed significant down-regulation in the tonsil B cell to 
tonsil plasma cell transition, the gene was reactivated in bone marrow plasma cells to 
levels higher than seen in the tonsil B cells. Whereas the chemokine receptors CXCR4 
and CXCR5 showed down-regulation in the tonsil B cell to tonsil plasma cell transition, 
CXCR4 showed a MYC-like profile in that the gene was reactivated in bone marrow 

25 plasma cells. The BCL2 homologue BCL2AJ also showed the expected changes. Thus, 
gene expression patterns of cell surface markers are consistent with phenotypic patterns 
and genes known to be strongly associated with plasma cell differentiation showed 
anticipated patterns. These data support the notion that the tonsil B cells, tonsil plasma 
cells, and bone marrow plasma cells represent distinct stages of B-cell differentiation and 

3 0 that gene expression profiling of these cells can be used to gain a better understanding of 
the molecular mechanisms of differentiation. 
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TABLE 10 

Gene Expression Of CD Marker And Proteins Known To Be Differentially Expressed 
During Plasma Cell Differentiation 



Accession 


Symbol 


TBC 


TPC 


BPC 


Y00062 


CD45 


1149512198 


497912522 


13851706 


M27394 


CD20 


23860±5494 


379912977 


2891358 


M89957 


CD79B 


14758±3348 


469612440 


124311357 


X62466 


CD52 


1457612395 


434812074 


283111002 


M84371 


CD19 


1233911708 


617411345 


28521852 


M26004 


CD21 


890911640 . 


543414053 


. 458114.0 


X59350 


CD22 


1034911422 


535611610 


19291612 


Zl 1697 


CD83 


920111900 


238011087 


3921403 


M54992 


CD72 


617711620 


8651554 


4541548 


Z48199 


CDJ38 


7191519 


993513545 


2464316206 


D84276 


CD38 


31221967 


983313419 


1483613462 






oi\ n+Ai 1 
/.i i yjjL^j i 


C.Q\ C+1 C©9 


I OO / OXJJUJ 


M63928 


CD27 


623511736 


1593716691 


* /ni At A A A *% 

1671414442 


M31627 


XBPl 


1297811676 


54912113649 


49558110798 


U52682 


IRF4 


18631630 


842213061 


1134813118 


U00115 


BCL6 


797911610 


330312070 


6181335 5 


X74301 


CIITA 


15531263 


2361217 


113182 


U16031 


STAT6 


13141512 


3861335 


1911187 


S76617 


BLK 


365411551 


3881592 


95186 


X68149 


CXCR5 


338111173 


1831299 


921183 


U29680 


BCL2A1 


329011073 


11211817 


4831209 


L00058 


MYC 


15281474 


3481239 


21031903 


L06797 


CXCR4 


1191112093 


667313508 


1803315331 



Accession - Gene Bank accession number. Symbol = HUGO approved gene symbol. 
The numbers in the columns under the tonsil B cell (TBC), tonsil plasma cell (TPC), and 
bone marrow plasma cell (BPC) samples represent the mean average difference (AD) 
value 1 the standard deviation (STD) for the given gene. Differences in expression across 
1 0 comparisons were significant (P < 0.01) unless indicated in bold. 
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EXAMPLE 14 

Identification of Differentially Expressed Genes in the Tonsil B Cell to Tonsil Plasma 
Cell and in the Tonsil Plasma Cell to Bone Marrow Plasma Cell Transitions 

A more detailed and comprehensive evaluation was performed to 
5 determine gene expression changes that accompany the transition of tonsil B cells to 
tonsil plasma cells and the changes that occur as the immature tonsil plasma cells exit the 
lymph node germinal center and migrate to the bone marrow. To reveal global expression 
distinctions among the samples, hierarchical cluster analysis was performed with 4866 
genes in 7 tonsil B cell, 7 tonsil plasma cell, and 7 bone marrow plasma cell cases (Figure 

10 11). As expected, this analysis revealed a major division between the tonsil B cell 
samples and plasma cell samples with the exception of one tonsil plasma cell sample 
being clustered with tonsil B cell. The normal plasma cells were further subdivided into 
two distinct groups of tonsil plasma cells and bone marrow plasma cells. Thus, global 
gene expression patterns confirmed the segregation of tonsil plasma cells and bone 

1 5 marrow plasma cells and also allowed the distinction of tonsil B cells from both plasma 
cell types. 

X 2 and Wilcoxon rank sum analysis were used to identify 359 and 500 
genes whose mRNA expression levels were significantly altered (P <.00005) in the tonsil 
B cell to tonsil plasma cell and tonsil plasma cell to bone marrow plasma cell comparison, 
20 respectively. Genes that were significantly differentially expressed in the tonsil B cell to 
tonsil plasma cell transition were referred as "early differentiation genes" (EDGs) and 
those differentially expressed in the tonsil plasma cell to bone marrow plasma cell 
transition were referred as "late differentiation genes" (LDGs). 

25 Earlv Differentiation Genes 

Of the top 50 EDGs (Table 1 1), most of the genes (43) were down- 
regulated with only 7 genes being up-regulated in this transition. Gene expression was 
described as being at 1 of 5 possible levels. An AAC, indicating an undetectable or 
absent gene transcript, was defined as For all the samples in a group, expression 

30 levels were defined as "+" if the gene transcript was present and the AD was <1000, 
"++" for 1000<AD<5000, "+++" for 5000<AD<1 0,000, and "++++" for AD>1 0,000. 
The largest group of EDGs encoded transcription factors. Of 16 transcription factors, 
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only 3, XBP-1, IRF4 and BMI1, were up-regulated EDGs. Among the down-regulated 
transcription factors, MYC and ClfTA were found. The largest family included four ets 
domain-containing proteins: ETS1, SPIB, SPI], and ELF1. Other transcription factors 
included the repressors EED and lD3 y as well as the activators RUNX3, ICSBP1, REL, 
5 ERG3, and FOXML It is of potential significance that as IRF4 is up-regulated in both 
the tonsil B cell to tonsil plasma cell and tonsil plasma cell to bone marrow plasma cell 
transitions, the IRF family member interferon consensus sequence binding protein, 
ICSBP1, which is a lymphoid-specific negative regulator, was the only gene that was 
expressed at a +++ level in tonsil B cells and was shut down in both tonsil plasma cells 

10 and bone marrow plasma cells. These results suggest that the removal of ICSBP1 from 
IRF binding sites may be an important mechanism in regulating IRF4 function. 

The second most abundant class of EDGs code for proteins involved in 
signaling. CASP10 which is involved in the activation cascade of caspases responsible for 
apoptosis execution represented the only signaling protein up-regulated in tonsil plasma 

1 5 cells. Three small G proteins, the Rho family members ARHG and ARHH, and the 
proto-oncogene HRAS were down-regulated EDGs. Two members of the tumor necrosis 
factor family TNF and lymphotoxin beta (LTB), as well as the TNF receptor binding 
protein were LDGs. Given the important role of IL-4 in triggering class-switch 
recombination, the observation of down-regulation (tonsil B cell to tonsil plasma cell), 

20 and eventual extinguishing (tonsil plasma cell to bone marrow plasma cell) of IL4R fits 
well with the differentiation states of the cells under study. 

Finally, the down-regulation of the B lymphoid tyrosine kinase (BLK) 
whose expression is restricted to B lymphoid cells and may function in a signal 
transduction pathway suggests that the reduction of this kinase is important in the early 

25 stages of plasma cell differentiation. Given the important role of cell adhesion in plasma 
cell biology, up-regulation of ITGA6 and PECAM1 could be of particular importance. In 
fact, these genes also showed a continual up-regulation in the tonsil plasma cell to bone 
marrow plasma cell transition and represented the only extracellular adhesion genes in the 
EDG class. Other multiple-member classes of down-regulated EDGs included those 

30 involved in cell cycle (CCNF, CCNG2, and CDC20) or DNA repair/ maintenance 
(TERF2, LIG1, MSH2, RPA1). The down-regulation of these genes may thus be 
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important to inducing and/or maintaining the terminal differentiated state of the plasma 
cells. 

Late Differentiation Genes 
5 In the top 50 LDGs, 33 were up-regulated or turned on and 17 genes were 

down-regulated or turned off (Table 12). Although 16 EDGs were transcription factors, 
only 5 LDGs belonged to this class. The BMI1 gene, which was an up-regulated EDG, 
was also an LDG, indicating that the gene undergoes a significant increase in expression in 
both the tonsil B cell to tonsil plasma cell and tonsil plasma cell to bone marrow plasma 

1 0 cell transitions. BMI1 was the only up-regulated transcription factor. MYBL1, MEF2B, 
and BCL6 were shut down in bone marrow plasma cells and the transcription elongation 
factor TCEA1 was down-regulated. The largest class of LDG (n=16; 11 up- and 5 
down-regulated) coded for proteins involved in signaling. The LIM containing protein 
with both nuclear and focal adhesion localization, FHL1\ and the secreted proteins, 

1 5 JAG1, a ligand for Notch, insulin-like growth factor IGF1; and bone morphogenic protein 
BMP6 were up-regulated. The dual specific phosphatase DUSP5 and the chemokine 
receptor CCR2 represented genes with the most dramatically altered expression and were 
turned on to extremely high levels in bone marrow plasma cells while being absent in 
tonsil plasma cells. Additional signaling genes, including the membrane cavealoe, CAV1 

20 and CAV2 y plasma membrane proteins important in transportation of materials and 
organizing numerous signal transduction pathways, were up-regulated LDGs. 

Given the dramatic difference in life spans of tonsil plasma cells (several 
days) and bone marrow plasma cells (several weeks to months), the up-regulation of the 
anti-apoptotic gene BCL2 (- in tonsil B cells and ++ in bone marrow plasma cells) and 

25 concomitant down-regulation of the apoptosis-inducing protein BIK (+++ in tonsil B 
cells and - in bone marrow plasma cells) may be critical in regulating normal programmed 
cell death. As in the EDGs, LDGs contained multiple adhesion-related genes, and, as in 
the EDGs, the LDG adhesion genes were all up-regulated. 

The PECAM1 gene was found to be both an EDG and LDG, suggesting 

3 0 that a gradation of cell surface expression of this gene is critical in development. Whereas 
the integrin family member ITGA6 was an EDG, TTGA4 was found to be an LDG. The 
finding that ITGA4 or VLA-4 (very late antigen 4) was an LDG is consistent with 
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published data showing that this integrin is most predominant on late stage plasma cells. 
The adhesion molecule selectin P ligand (SELPLG) which mediates high affinity, calcium- 
dependent binding to P-, E- and L-selectins, mediating the tethering and rolling of 
neutrophils and T lymphocytes on endothelial cells, may facilitate a similar mechanism in 
late stage plasma cells. In addition, the epithelial membrane protein 3 (EMP3), a integral 
membrane glycoprotein putatively involved in cell-cell interactions, was identified. 
LRMP (JAW1), a lymphoid-restricted, integral ER membrane protein based on strong 
homology to MRVII (I RAG) and is likely a essential nitric oxide/cGKI-dependent 
regulator of IP3-induced calcium release from endoplasmic reticulum stores, was found to 
be a down-regulated LDG. The discovery of LRMP as a down-regulated LDG is 
consistent with previous studies showing that, although highly expressed in. lymphoid 
precursors, it is shut down in plasma cells. 

Thus, the gene expression profiling results confirmed previous 
observations as well as identified novel and highly significant changes in mRNA 
synthesis when tonsil B cells and tonsil plasma cells and tonsil plasma cells and bone 
marrow plasma cells are compared. 
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TABLE 11 

Earlv-Stage Differentiation Genes: Top 50 Differentially Expressed Genes In 
Comparison Of Tonsil B Cells And Tonsil And Bone Marrow Plasma Cells 

Quantitative Gene 
Expression 



Accession Symbol Function TBC TPC BPC 



U60519 
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++ 


X53586 


ITGA6 


adhesion 
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++■ 
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++ 


++ 


JLi I JU07 
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trartQrrintinn* 
ll allow ipilUll, 




++ 


1 } { 






renressor* Pcfi 
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PEC AMI 


adhesion 

uullvjlwll 




++ 


+++ 


U52682 


IRF4 


transcription; IRF 


+ 


+++ 


+++ 






family 








M31627 


XBP1 


transcription; 


+++ 


-H-f 


MM 
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AB000410 


OGGl 




.-.•■ + ■ . 






D87412 
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Li J OOZU 


\sULLJ 
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GRPI 
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hindintr 
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Mo 1 1 5Z 


AdL,Dd 


Ar>u transporter 








M85085 


CSTF2 


mRNA cleavage 


+ 










stimulating factor 








U74612 


FOXM1 


transcription; 


+ 










fork-head family 








U84726 


RAEJ 


RNA export 


+ 






V00574 


HRAS 


signaling; GTP 


+ 










binding protein 








X02910 


TNF 


signaling; TNF_ 


+ 






X63741 


EGR3 


transcription; egr 


+ 










family 








X93512 


TERF2 


telomere repeat 


+ 







binding protein 
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Quantitative Gene 
Expression 



Accession 


Symbol 


Function 


TBC 


TPC 


BP 


Z36714 


CCNF 


cell cycle; cyclin 
F 




- 


- 


AB000409 


MNK1 


signaling; kinase. 




- 


+ 


M33308 


VCL 


cytoskeleton 


+ 


- 


++ 


D16480 


■ HADHA 


mitochondrial 


++ 


- 


- 






oxidation 








M63488 


RPAI 


DNA 


++ 


- 


- 






replication/repair 








U03911 


MSH2 


DNA repair 


++ 


- 


- 


U69108 


TRAF5 


signaling; TNFR 


■H- 


- 


- 






associated protein 








X12517 


SNRPC 


mRNA splicing 


-H- 


• - 


- 


X52056 


SPI1 


transcription; ets 


++ 


- 


- 






family 








X68149 


BLR] 


signaling; cxc 


++ 


- 


- 






receptor 








X74301 


CIITA 


transcription; 


++ 


- 


- 






adaptor. 








X75042 


REL 


transcription; 


++ 


- 


- 






rel/dorsal family 








L00058 


MYC 


transcription; 


++ 


- 








bHLHZip 








M36067 


LIGJ 


DNA ligase 


■H- 


+ 


+ 


M82882 


ELF1 


transcription; ets 


++ 


+ 


+ 






family 








S76617 


BLK 


signaling; kinase 


■H- 


+ 


+ 


U47414 


CCNG2 


cell cycle; cyclin 


++ 


+ 








G 








U61167 


SH3D1B 


unknown; SH3 


++ 


+ 


+ 






containing 












protein 








X61587 


ARHG 


signaling; Rho G 


++ 


+ 


+ 


Z35278 


RUNX3 


transcription; ... 


++. 




+ 



contains runt 
domain 
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Accession 


CvimIiaI 

oymDOi 


f unciion 


Quantitative Gene 
Expression 

— — ■ trp — TPf* i>pr ------ 

lDt lit X>X \^ 
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1 i l j_ 

1 1 1 T — 
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uyuoM 




transcription; 
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TTT T T 






repression; PcG 




V£n ill 
X691 11 


IDS 


transcription; 


H '1 "+ + 






repression; bHLH 




X52425 


1L4R 


signaling; 


-H+ 4H- 






cytokine receptor 




Z35227 


AKHti 


signaling; Rho H 


-4-4-4- ++ 4- 




T 77? 
Lin 


signaling, irvr-c 


lit 4- 4- 
TTT T T 

+ 


U05340 


CDC20 


cell cycle; 


4~H- 4-h 






activator of APC 


4- 


X66079 


SPIB 


transcription; ets 


4-H- 4-f 






family 


4- 



Accession =GeneBank accession number. Symbol=HUGO approved gene symbol. 
TBC, tonsil B cell; TPC, tonsil plasma cell; BPC, bone marrow plasma cell; AD, mean 
average difference; AC,absolute call. Quantitative gene expression: AC absent; +, AC 
present and AD < 1,000; ++, AD = 1,000 to 5,000; ■+++, AD = 5,000 to 10,000; ++++, 
AD>10,000. 
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TABLE 12 

La^-^g^ifferentiation Genes: Top 50 Differentially Expressed Genes \tl Comparison 
Of Tonsil And Bone Marrow Plasma Cells 



Quantitative Gene 
Expression 



Accession 


Symbol 


Function 


TPC 


BPC 


U32114 


CAV2 


signaling; membrane 
caveolae 


- 


+ 


U60115 


FEU 


signaling; LIM domain 


— 


+ 


U73936 


JAG1 


signaling; Notch ligand 






X57025 


IGF1 


signaling; growth factor 


- 




Z32684 


XK . . 


membrane transport. . . 


- . 


+ 


D10511 


ACATI 


metabolism; ketone 




++ 


Y08999 


ARPCIA 


actin polymerization 


: 


++ 


M14745 


BCL2 


signaling; anti-apoptosis 


- 


++ 


M24486 


P4HA1 


collagen synthesis 


- 


++ 


M60315 


BMP 6 


signaling; TGF family 


- 


++ 


U25956 


SELPLG 


adhesion 


- 


++ 


X16983 


ITGA4 


adhesion 




4+ 


Z18951 


CAV1 


signaling; membrane 
caveolae 


- 


-H- 


M60092 


AMPD1 


metabolism; energy 


- 


+H- 


U15932 


DUSP5 


signaling; phosphatase 


- 


++++ 


U95626 


CCR2 


signaling; chemokine 
receptor 


- 


-H~H- 


D78132 


RHEB2 


signaling; ras homolog 


+ 


++ 


L41887 


SFRS7 


mRNA. splicing factor. 




-H- 


M23161 


LOC90411" 


unknown 




++ 


M37721 


PAM 


metabolism; hormone 
amidation 


+ 


++ 


M69023 


TSPAN-3" 


unknown 


+ 


++ 


U02556 


TCTE1L 


dynein homolog 


+ 


++ 


U41060 


LlV-1 a 


unknown 


+ 


++ 


U44772 


PPT1 


lysosome enzyme 


4- 


++ 


U70660 


AT OX I 


metabolism; antioxidant 


-f 


++ 


X92493 


PIP5K1B 


signaling; kinase 


+ 


++ 
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Quantitative Gene 
Expression 



— Accession 


Symbol 






Function ' 


TPC BPC - 


M23254 


CAPN2 


cysteine protease 


+ +++ 


J02763 


SJ00A6 


signaling; calcium binding 


++ +++ 


L13689 


BMII 


transcription; repressor; 
PcG 


++ +H- 


L34657 


PECAM1 


adhesion 


++ -H-f 


M23294 


HEXB 


metabolism; 
hexoaminidase 


-H- +-H- 


M64098 


HLDBP 


metabolism; sterol 


++ *H-H- 


U52101 


EMP3 


adhesion 


++ ++++ 


X66087 


MYBL1 


transcription; myb-like 


+ 


X54942 


CKS2 


cell cycle; kinase 
regulator 


++ 


X73568 


SYK 


signaling; kinase 


-H- 


L08177 


EBI2 


signaling; receptor 


++ 


M25629 


KLK1 


protease; serine 


++ - 


U00115 


BCL6 


transcription; Zn-finger 


++ 


U23852 


LCK 


signaling; kinase 


++ 


U60975 


SORL1 


endocytosis 


-H- 


X63380 


MEF2B 


transcription; MADs box 


++ 


L25878 


EPXH1 


metabolism; epoxide 
hydrolase 


++ + 


Z35227 


ARHH 


signaling; Rho C 


++ + 


X89986 


BIK 


signaling; apoptosis 




M13792 


ADA 


metabolism; purine 


4+f + 


U10485 


LRMP 


ER membrane protein 


+++ + 


M8I60I 


TCEA1 


transcription; elongation 


+++ -H- 


X70326 


MACMARCK 


actin binding 


++++ + 


X56494 


PKM2 


metabolism; energy 


++++ + 



Accession = GeneBank accession number. Symbol = HUGO approved gene symbol. 
"Unapproved symbol. TPC, tonsil plasma cell; BPC, bone marrow plasma cell; AD, 
mean average difference; AC, absolute call. Quantitative gene expression: AC absent; 
5 +, AC present and AD < 1,000; ++, AD = 1,000 to 5,000; +++, AD = 5,000 to 10,000; 
++++, AD >1 0,000. 
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EXA3V1PLE>5 

Previously Identified And Novel Genes In Plasma Cell Differentiation 
5 In this gene expression profiling study, not only previously identified but 

also novel genes associated with plasma cell development were identified. Some of the 
genes that may be pertinent to plasma cell differentiation are discussed here. 

Polyadenylation of mRNA is a complex process that requires multiple 
protein factors, including 3 cleavage stimulation factors (CSTF1, CSFT2 and CSTF3). It 

10 has been shown that the concentration of CSTF2 increases during B cell activation, and 
this is sufficient to switch IgM heavy chain mRNA expression from membrane-bound 
form to secreted form. The CSTF2 gene was expressed at low levels in tonsil B cells, 
but was turned off in tonsil and bone marrow plasma cells, indicating that CSTF2 gene 
expression can also be used to define plasma cell differentiation. 

15 The gene for CD63 showed a progressive increase in gene expression 

across the three ceil types studied. CD63 belongs to the transmembrane 4 super family 
(TM4SF) of membrane proteins. Expression has been found on the intracellular 
lysosomal membranes of hemopoietic precursors in bone marrow, macrophages, 
platelets, and Wiebel-Palade bodies of vascular endothelium. Importantly, CD63 was 

20 described as a maker for melanoma progression and regulates tumor cell motility, 
adhesion, and migration on substrates associated with pi integrins. 

Most importantly, the discovery of novel genes reported herein will lead 
to a broader knowledge of the molecular mechanisms involved in plasma cell 
differentiation. Specifically, of the top 50 EDGs, most were down-regulated, and a 

25 majority of the EDGs were transcription factors, suggesting that transcriptional 
regulation is an important mechanism for modulating differentiation. Among the LDGs, 
transcription factor representation was much lower than among the EDGs. 

Cell Cvcle Control a nd Programmed Cell Death 
30 Consistent with the terminal differentiation of plasma cells, many genes 

involved in cell cycle control and DNA metabolism were down-regulated EDGs. The 
modulation of DNA ligase LIG1; repair enzymes MSHC, and RPA1, CDC20; and the 
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cyclins CCNG2 and CCNF may have important consequences in inducing the quiescent 
state of plasma cells. The telomeric repeat binding protein TERF2, which is one of two 
recently cloned mammalian telomere binding protein genes, was a down-regulated EDG. 
TERF2 acts to protect telomer ends, prevents telomere end-to-end fusion, and may be 
5 important in maintaining genomic stability. It is of interest to determine if TERF2 is 
down-regulated during the terminal differentiation of all cell types, and whether the lack 
of this gene product in tumors of terminally differentiated cells results in the high degree 
of chromosome structural rearrangements which is a hallmark of multiple myeloma that 
lacks TERF2 gene expression (unpublished data).- 

10 The CDC28 protein kinase 2 gene CKS2, which binds to the catalytic 

subunit of the cyclin dependent kinases and is essential for their biological function, was 
the only cell cycle gene in the LDG genes. It was expressed in tonsil plasma cells that are 
capable of modest proliferation; however, CKS2 was completely extinguished in bone 
marrow plasma cells. Thus, shutting down CKS2 expression may be critical in ending the 

1 5 proliferative capacity of bone marrow plasma cells. 

A distinguishing feature of plasma cell terminal differentiation is the 
acquisition of increased longevity in the bone marrow plasma cells. It is likely that this 
phenomenon is controlled through programmed cell death or apoptosis. The finding that 
anti-apoptotic and pro-apoptotic genes, BCL2 and BIK, demonstrated opposing shifts in 

20 expression is consistent with these two genes playing major roles in extending the life- 
span of bone marrow plasma cells. 

Transcription Factors 

The majority of differentially expressed genes belong to the transcription 

25 factor family. Of the 50 EDGs, only 7 were up-regulated. IRF4 and XBP1, two genes 
known to be up-regulated during plasma cell differentiation were in this group. Both 
genes were expressed at equal levels in the tonsil and bone marrow plasma cells, 
suggesting that a continual increase in expression of these important regulators does not 
occur. Although not on the HuGenFL Microarray, recent studies using third generation 

30 AffymetrixU95Av2 microarray have also revealed an induction of Blimp-1 (PRDM1) 
expression in plasma cells compared with tonsil B cells (unpublished data), confirming 
the expected patterns of these transcription factors. 
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The vast majority of EDGs were down-regulated and the single largest 
subgroup of EDGs represented transcription factors (13 of 43 genes). Four of th e 13 
transcription factors, ETS1, SP11, SP1B, and ELF1, belong to the ets family. These 
results are consistent with previous studies showing that several of the ETS proteins 
5 (ETS1, ELF1, PU.l (SPIl), and SPI-B) are expressed in the B cell lineage. It is 
interesting to note that the down-regulation of ETS1 in the transition between tonsil B 
cell to tonsil plasma cell may be an important switch, as ETS1 knock-out mice show 
dramatic increases in plasma cells in the spleen and peripheral blood. In addition, it is 
curious that although SPIl (PU.l) interacts with IRF4 in Blimp- 1 + germinal center tonsil 

10 B cells and plasma cells, data presented herein show that whereas IRF4 is unregulated in 
the plasma cell transition, SPIl is shut down in tonsil and bone marrow plasma cells. 
Thus, these data support the notion that the ets family of transcription factors are 
important hematopoietically and that down-regulation of at least four family members 
appears to be an important event in terminal differentiation of plasma cells. 

15 The cytoskeletal gene vinculin (VCL) and the MAP kinase-interacting 

serine/threonine kinase 1 gene (MKNK1) represented novel EDGs. Vinculin is thought to 
function in anchoring F-actin to the membrane, whereas MKNK1 is an ERK substrate 
that phosphoryiates eIF4e after recruitment to the eIF4F complex by binding to eIF4G. 
• These two genes were turned off in the tonsil B cell to tonsil plasma cell transition, but 

20 were reactivated in bone marrow plasma cells. The MYC proto-oncogene also showed a 
dramatic down-regulation in the tonsil B cell to tonsil plasma cell transition with 
reactivation in bone marrow plasma cells. It will be important to understand if these two 
genes are regulated either directly or indirectly by MYC. One of the mechanisms by 
which PRDF1-BF1 promotes generation of plasma cells is repression of MYC, thereby 

25 allowing the B cells to exit the. cell cycle and undergo terminal differentiation. Instant 
study showing the extinguishing of MYC in the tonsil B cell to tonsil plasma cell 
transition is consistent with this data. The reactivation of MYC in bone marrow plasma 
cells to levels similar to those seen in tonsil B cells, which appear to be highly 
proliferative blasts, is unresolved but suggests that MYC may have dual roles. 

30 Similar to the tonsil B cell to tonsil plasma cell transition, the majority of 

the transcription factors were down-regulated in the tonsil to bone marrow plasma cell 
transition. The BCL6 gene, although not in the top 50 significant EDGs, did make the 
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top 50 list for LDGs. BCL6 did show a progressive loss of expression from tonsil B 
cells to tonsil plasma cells (see Table 10), but there was then a dramatic loss of 
expression in bone marrow plasma cells. Additional transcription factors, the myb-like 
gene MYBL1, and the MADS box factor MEF2B, were also turned off in bone marrow 
5 plasma cells and may be major regulators of the terminal stages of plasma cell 
differentiation. The transcription elongation factor TCEA1 was down-regulated but 
remained present. BMI1, a member of a vertebrate Polycomb complex that regulates 
segmental identity by repressing HOX genes throughout development, showed a 
significant progressive increase in expression across all groups. It is of note that BMI1 is 

10 the human homolog of the mouse Bmi-1 proto-oncogene originally discovered as 
cooperating with transgenic c-Myc in inducing B cell lymphomas. 

Given the recognition that changes in levels of expression of transcription 
factors represent the most striking feature of plasma cell differentiation, it is of interest 
to elucidate distinct pathways of transcriptional regulation driven by the various classes 

15 of transcription factors discovered herein. This can be done with the aid of global 
expression profiling and sophisticated data mining tools such as Baysian networks. 

EXAMPLE 16 

Identification of Genes with Similar Expression Between Multiple Myeloma and Cells at 

20 Different Stages of B Cell Development 

Examples 16 and 17 describe the establishment of a B cell developmental 
stage-based classification of multiple myeloma using global gene expression profiling. 

To classify multiple myeloma with respect to EDG and LDG reported 
above, 74 newly diagnosed cases of multiple myeloma and 7 tonsil B cell, 7 tonsil plasma 

25 cell, and 7 bone marrow plasma cell samples were tested for variance across the 359 
EDGs and 500 LDGs disclosed above. The top 50 EDGs that showed the most 
significant variance across all samples were defined as early differentiation genes for 
myeloma (EDG-MM); likewise, the top 50 LDGs. showing the most. significant variance 
across all samples were identified as late differentiation genes for myeloma- 1 (LDG- 

30 MM1). Subtracting the LDG-MM1 from the 500 LDG and then applying one-way 
ANOVA test for variance to the remaining genes identified the top 50 genes showing 
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similarities between bone marrow plasma cells and multiple myeloma. These genes were 
defined as LDG-MM2. 

Within the top 50 EDG-MM (Table 13), 18 genes that showed up- 
regulation in the tonsil B cell to tonsil plasma cell transition showed down-regulation to 
5 levels at or below that seen in tonsil B cells. The remaining 32 EDG-MM showed a 
reverse profile, in that these genes were down-regulated in the tonsil B cell to plasma cell 
transition, but showed tonsil B cell-like expression in multiple myeloma. In Table 13, 
gene expression was described as being at 1 of 5 possible levels. An absent absolute call 
(AAC), indicating an undetectable or absent gene transcript, was defined as \ For all 
10 the samples in a group, expression levels were defined as if the gene transcript was 
present and the average difference (AD) was <1Q00, "++" for 1000 < AD<5000, "+++" 
for 5000 < AD < 10,000, and "++++" for AD>10,000. 

One of the most striking genes defining EDG-MM was the cyclin 
dependent kinase 8 (CDK8), which was found absent in tonsil B cells but up-regulated to 

1 5 extremely high levels in tonsil and bone marrow plasma cells and then shut down again in 
virtually all multiple myeloma cases. The mitotic cyclin showed a progressive loss in 
expression from tonsil B cell (++) to tonsil plasma cell (+) to bone marrow plasma cell 
(-), whereas multiple myeloma cases either, showing bone marrow-lil^e leyels.or tonsil B 
cell levels. Given that the tonsil B cells used in this study likely represent highly 

20 proliferative centroblasts, multiple myeloma cases with similar levels might be suggestive 
of a proliferative form of the disease. A total of 27 of the top 50 EDG-MM showed no 
variability in multiple myeloma, ie, all multiple myeloma and tonsil B cell samples 
showed similar levels of expression, 

A majority (34 of 50) of the top 50 LDG-MMt (Table 14) were genes 

25 that showed up-regulation from the transition of tonsil plasma cell to bone marrow 
plasma cell, but showed down-regulation to tonsil plasma cell levels in multiple 
myeloma. The overall pattern seen for LDG-MM1 was the reverse seen for the EDG- 
MM, where a majority of those genes showed down-regulation from tonsil B cell to 
plasma cell and up regulation to tonsil B cell-like levels in multiple myeloma. The most 

30 dramatically altered LDG-MM1 was seen in the massive up-regulation of the cxc 
chemokines SDF1, PF4, and PPBP in bone- marrow plasma cells in contrast with 
complete absence of detectable transcripts in all multiple myeloma. These results are 



67 



WO 03/053215 



PCTYUS02/35724 



validated by the fact that two separate and distinct probe sets interrogating different 
region o f SDF1 (ac cession numbers L36033 and U19495) were found to show identical 
patterns. The RBI tumor suppressor gene showed a significant up-regulation in the 
tonsil plasma cell (+) to bone marrow plasma cell (++) transition with multiple myeloma 
5 showing levels consistent with either cell type. Unlike with the EDG-MM, only 15 of 
the top 50 LDG-MM1 showed no variability within the multiple myeloma population. 

The LDG-MM2 genes (Table 15) showing similarities between bone 
marrow plasma cells and subsets of multiple myeloma revealed that all genes showed 
variability within multiple myeloma and that the variability, could be dramatic, e.g. the 
10 apoptosis inhibitor BIK. Unlike those seen in EDG-MM and LDG-MM1, a large class 
of LDG-MM2 represented genes coding for enzymes involved in metabolism with a 
majority involved in glucose metabolism. 
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TABLE 13 

EDG-MM: Tonsil B Cell-like Multiple Mveloma Genes 



Quantitative Gene Expression 



Accession 


Symbol 


Function 


TBC 


TPC 


BPC 


MM 


D28364 


ANXA2 


annexin family 




+ 


+ 


-/+ 


U81787 


WNT10B 


signaling; ligand 


- 




++ 


- /++ 


U88898 


LOC5158I* 


unknown 




+ 


+ 


-/+ 


X12451 


CTSL 


protease; 
cysteine 


— ■ 


++ 


++ 


— 


Z25347 


CDK8 


cell cycle; 
kinase 


— 


+++ 
+ 




- /++ 


D38548 


KIAA0076* 


unknown 


+ 


++ 


++ 


+/++ 


D86479 


AEBPJ 


extracellular 
matrix 


■ ■ + 


■++■ 


. 44- 


+ 


U04689 


OR1D2 


signaling; 
receptor 


+ 


++ 


+ 


+ 


M31328 


GNB3 


signaling; G 
protein 




++ 


++ 


+ 


U13395 


WWOX 


metabolism; 
oxidoreductase 


+ 


++ 


++ 


+ 


X14675 


BCR 


signaling; 
GTPase for 
RAC 


+ 


++ 


++ 


+ 


XI 6665 


HOXB2 


transcription; 

homeobox 

domain 


+ 


++ 


++ 


-/ + 


Z11899 


POU5FJ 


transcription; 

homeobox 

domain 




++ 


++ 


+ 


Z36531 


FGL2 


secreted 
fibrinogen-like 


+ 


++ 


++ 


+ 


X80907 


PIK3R2 


signaling; kinase 
adaptor 


+ 


+++ 


+++ 


-H- 


D31846 


AQP2 


aquaporin 


++ 


+++ 


+++ 


++ 


L18983 


PTPRN 


phosphatase; 
membrane 


++ 


+++ 
+ 


++-H- 


++ 
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Accession Symbol Function 



M23323 



D83781 
HT4824 



S78873 

U32645 
X97630 

Z24724 

D16480 

L77701 

M90356 

U08815 
U53225 

M25753 
D87448 

L38810 
M29551 



PCT/US02/35724 

Quantitative Gene Expression 
JTBC TPC BPC MM 



CD3E 



signaling; TCR 
partner 



KIAA0197* 
CBS 



RABIF 

ELF4 
EMK1 

UNKNOWN 

HADHA 

COX17 

BTF3L2 

SF3A3 
SNX1 

CCNB1 
TOPBPr 

PSMC5 
PPP3CB 



unknown 

metabolism; 

cystathionine- 

beta-synthase 

signaling; GTP 

releasing 

factor 

transcription; 
ets domian 
signaling; 
kinase; ELK 
domain 
cell cycle 

mitochondrial 

oxidation 

mitochondrial 

oxidation 

transcription; 

NAC domain 

spliceosome 

intracellular 

trafficking 

cell cycle 

topoisomerase 

II binding 

protein 

26S proteasome 

subunit 5 

signaling; 

calcium 

dependent 

phosphatase 



+ 
+ 



-H- 



+ 



+ /-H- 



-/ + 



+ /- 



- /++ 



- /- 



+ /4 
+ /4 

- h 
+ /4 



+ /++ 
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Quantitative Gene Expression 



Accession 


Symbol 


Function TBC 


TPC 


BPC 


MM 


M32886 


SRI 


signaling; ++ 


+ 


+ 


++ 






calcium binding 








U24704 


PSMD4 


26 S proteasome ++• 


+ 


+ 


-H- 






subunit 4 








U25165 


FXR1 


RNA binding ++ 


+ 




++ 






nrotein 








U37022 


CDK4 


cell cvcfe* ++ 




+ 


-H- 






kinase 








U53003 


C21orf33 


unknown; ++ 


+ 




-H- 






hierhlv 












conserved 








X89985 


BCL7B 


actin ++ 


+ 


+ 


++ 






cross Unking 








D49738 


CKAP1 


tubulin folding 1 1 1 


+ 


+ 


++■ 


D43950 


CCT5 


chaperon in +++ 


++ 


++ 


4-H- 


D82348 


ATIC 


metabolism; +++ 


++ 


++ 


+++ 






nurine 

l/UI J II V 












h i o svnth esi s 








D86550 


DYRKJA 


sionalincr kinase 1 h+ 


++ 


44- 


+++ 


T 06112 


VDAC1 


aniAn rharmpl I I I- 


++ 


++ 




L43631 


SAFE 


nuclear scaffold 4 1 1 


++ 


++ 


-H-f 
++/ 






factor 






+++ 


M30448 


CSNK2B 


?ifftialin<r* casein 1 1 !■ 


++ 


-H- 


++/ 






kinase 






++++ 






regulation 








X76013 




metabolism; +++ 


++ 


++ 


++/ 






glutaminyl 






+++ 






tRNA 












synthetase 








D83735 


CNN2 


actin binding +-H-+ 


++ 


•H- 


++/ 
++-H- 


M86667 


NAP1L1 


nucleosome ++++ 


++ 


-H- 


+++ 






assembly 








X04828 


GNAI2 


signaling; G ++++ 


++ 


4+ 


++/ 
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Quantitative Gene Expression 

J Accession Symbol _ Function TBC TPC BPC - — -MM — 

protein +-H- 

Genes identified by one-way ANOVA analysis. Accession = GeneBank accession 
number. Symbol = HUGO approved gene symbol; unapproved symbol marked by a . 
TBC, tonsil B cells; IPC, tonsil plasma cells; BPC, bone marrow plasma cells; AC, 
5 absolute call; AD, average difference. Quantitative gene expression: AC absent; +, AC 
present and AD < 1,000; ++, AD = 1,000 to 5,000; +++, AD = 5,000 to 10,000; ++++, 
AD>10,000. 
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Table 14 

LDG-MM1: Tonsil Plasma Cell-Like Multiple Mveloma Genes 



Quantitative Gene 
Expression 



Accession 


Symbol 


Function 


TPC 


BPC 


MM 


U90902 


23612 a 


unknown; related to 


- 


4 


-/++ 






TIAM1 








D12775 


AMPD3 


metabolism; AMP 


- 


4 , 


-/++ 






deaminase 








U37546 


BIRC3 


signaling; anti-apoptosis 


- 


44 


- /++ 


Z11793 


SEPP1 


metabolism; selenium 




444 


-/ + 






transport 








L36033 


SDF! 


signaling; cxc chemokine 




4+4 


- 


U 19495 


SDF! 


signaling; cxc chemokine 




444 


- 


M27891 


CST3 


protease inhibitor 


- 


444+ 


- /44++ 


M26602 


DEFA1 


immunity 


- 


++++ 


- /++++ 


M25897 


PF4 


signaling; cxc chemokine 


— 


+4++ 


- 


M54995 


PPBP 


signaling; cxc chemokine 


- 


44+4 


- 


U79288 


K1AA051 
i a 


unknown 


4 


44 


+ /++ 


M59465 


TNFAIP1 


signaling; anti-apoptosis 


4 


44 


+/++++ 


X53586 


ITGA6 


adhesion 


4 


++ 


+ /++ 


D50663 


TCTEL1 


dynein light chain 


4 


++ 


+ 


U40846 


NAGLU 


metabolism; hepran 


4 


++ 


4/44 






sulfate degradation 








M80563 


S100A4 


Signaling; calcium binding 


4 


+4 


4/4+44 


X04085 


CAT 


metabolism; catalase 


+ 


++ 


4/44 


L02648 


TCN2 


metabolism; vitamin B12 


• + 


++ 


4 






transport 








L35249 


ATP6B2 


lysosome; vacuolar 


4 


++ 


4 






proton pump 








L09209 


APLP2 


amyloid beta precursor 


+ 


+4 


4 






like 








L41870 


RBI 


cell cycle 


+ 


44 


4/44 


X76732 


NUCB2 


signaling; calcium binding 


+ 


4+4 


4/444 


D29805 


5B4GALT 
1 


adhesion 


4 


4+4 


4 
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-Accession Symbol — — Function - 

M29877 FUCA1 lysosome; rucosidase 

M32304 TIMP2 metal loproteinase 2 

inhibitor 

D10522 MACS actin crossl inking 

L38696 RALY* RNA binding 

U05875 IFNGR2 signaling; interferon 

gamma receptor 
U78095 SPINT2 protease inhibitor; blocks 

HGF 

LI 3977 PRCP lysosomal; angiotensinase 

C 

U 1 225 5 FCGRT IgG Fc receptor 

L06797 CXCR4 signaling; SDF1 receptor 

D82061 FABGL metabolism 

Y00433 GPX1 oxidation protection 

M60752 H2AFA histone; nucleosome 

U 1 83 00 DDB2 DN A repair 

X63692 DNMT1 DNA methyltransferase 

D11327 PTPN7 signaling; phosphatase 

X54942 CKS2 cell cycle; kinase 

regulator 

D14874 ADM adrenomedullin 

D86976 KIAA022 minor histocompatability 

3 8 antigen 
X52979 SNRPB mRNA splicing 

Z49254 MRPL23 mitochondrial ribosomal 

protein 

U66464 HPK1 signaling; kinase 

U9 1 903 FRZB signaling; WNT 

antagonists 

D87453 MRPS27 mitochondrial ribosomal 

protein 

X59932 CSK signaling; kinase 

L1713I HMGIY transcription; high 

mobility group 



Quantitative Gene 
Expression 

TPC BPG MM- 



+ 
+ 



-H-H- 
-H-hf 



-H-H-- 



+ 



+ 
+ 
+ 



+ /++ 
+ /-H-H- 

- /++ 



-/+++ 



- / 
++/ 
++/ 
++/ 
-/++ 
+ 
+ 



+ /+++ 



+/- 



++■ 

+ /++ 
+ /++ 

+ /++ 
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Quantitative Gene 
Expression 

Accession ""Symbol Function"" TPC BPC MM 

LI 9779 H2AFO histone; nucleosome ++++ ++ -H+f 

U70439 SSP29 a unknown ++++ +++ +++/++++ 



Genes identified by one-way ANOVA analysis. Accession = GeneBank accession 
number. Symbol = HUGO approved gene symbol; unapproved symbol marked by a . 
TPC, tonsil plasma cells; BPC, bone marrow plasma cells; AC, absolute call; AD, average 
5 difference.Quantitative gene expression: AC absent; +, AC present and AD < 1,000; 
++, AD = 1,000 to 5,000; AD = 5,000 to 10,000; ++++, AD >1 0,000. 
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Table 15 

LDG-MM2: Bone marrow Plasma Cell-like Multiple Mveloma Genes 

Quantitative 
Gene Expression 



Accession 


Symbol 


Function 


BPC 


MM 


U61145 


EZH2 


transcription; SET domain 


- 


-/ + 


HT4000 


SYK 


signaling; lymphocyte kinase 


- 


-/++ 


X89986 


BIK 


signaling; apoptosis inducer 


- 


_/++++ 


D85181 


SC5DL 


metabolism; sterol-C5- 


+ 


-/ + 






desaturase 


• 




M98045 


FPGS 


metabolism; 


+ 


-/++ 






■folvlnnlvcjlutumat"^ cvnthacp 
\\j\y ijJuijrgiuuuiiaLv ojiiuiaoo 






L41559 


PCBD 


transcription; enhances 


+ 


-/++ 






TCF1 activity 






L25876 


CDKN2 


cell cycle; CDK inhibitor; 


+ 


+ /++ 






phosphatase 






U76638 


BRADJ 


transcription; BRCA1 


+ 


+ /++ 






heterodimer 






L05072 


IRF1 


transcription; IRF family 


+ 


+ /++ 


D87440 


KIAA02 
5° 


unknown 




+ /++ 


U02680 


PTK9 


tyrosine kinase 




+ /++ 


U28Q42 


DDX10 


oncogene; ATP-dependent 




+ /++ 






RNA helicase 






L20320 


CDK7 




+ 


+ /.++ 


X56494 


PKM2 


metabolism; pyruvate kinase 


+ 


+ /-H-H- 


M12959 


TCRA 


signaling; T cell receptor 


++ 


-/++ 


HT3981 


INSL3 


signaling; insulin-like 


++ 


-/++ 






peptide; IGF family 






U21931 


FBP1 


metabolism; fructose 


-H- 


-/++++ 






bisphophatase 






Z48054 


PXR1 


metabolism; peroxisome 


++ 


+ /++ 






biogenesis 






D84145 


WS-3* 


dynatin 6 


++ 


+ /++ 


D14661 


KIAA01 


transcription; WT1- 


++ 


+ /++ 




05* 


associating protein 






X77548 


NCOA4 


transcription; nuclear 


++ 


+ /++ 
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Quantitative 
Gene Expression 

Accession Symbol Function BPC ""MM 







receptor coactivator 






M9Q696 


CTSS 


cysteine protease 


++ 


+ /++ 


Dl 1086 


IL2RG 


cytokine receptor 


++ 


+ /++ 


U70426 


RGSlo 


signaling; GTPase activating 


-H- 


+ /+++ 






protein 






XI4850 


H2AX 


histone; required for antibody 


++ 


+ / 1 M 






maturation 






M29927 


OAT 


metabolism; ornithine 


++ 


+ /+++ 






aminotransferase 










iranscnpiion* 


t i 

TT 


4- /ill 
T / I PI 


rl 14DU4 


nvn 

Lr/Lr 


metabolism; glycogen 


t i 
TT 


1 / ,1 L ,1, 

T / T TT 






biogenesis 






M55531 


SLC2A5 


metabolism; fructose 


++ 


+/++++ 






transporter 






M60750 


H2BrL 


histone; nucleosome 


++ 


+ / Mil 


L19437 


IALDU1 


metabolism; transaldolase 


++ 


++/ +-H- 


Xyf 1 AAA 1 


iv/OC/ 


transcription; glucocorticoid 


t i 
TT 


TT / 1 1 I 






receptor 






L41887 


SFRS7 


MRNA splicing factor 


++ 


++/+++ 


M34423 


GLBI 


metabolism; galactosidase 


++ 


++/ 
++++ 


X15414 


AKR1B1 


metabolism; aldose reductase 


+++ 


+/++++ 


J04456 


LGALS1 


signaling; inhibits CD45 


-H-f 


+/++++- 






phosphatase 






X92493 


PIP5K1 
B 


signaling; kinase 


+++ 


+ /-H-H- 


U51478 


ATP IBS 


Na+, K+ transporter 


+++ 


• 

++/ 
1 1 If 


X91257 


SARS 


seryi-tRNA synthetase 


+++ 


++/ 
++++ 


D30655 


EIF4A2 


translation initiation 


+++ 


++/ 


D31887 


KIAA00 


unknown 


+++ 


++++ 
++/ 




62' 






++++ 


X04106 


CAPN4 


cysteine protease; calcium 


+++ 


++/ 
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Quantitative 
Gene Expression 



Accession Symbol 'Function ~ BPC MM 







dependent 


■nil 


D87442 


NCSTN* 


nicastrin +4 


4- -H- / 


L76191 


IRAKI 


sienalinff' cvtolcine recentor ++ 

kinase 


-H-H- 
+ -H-i- / 

++++ 


HT1428 


HBB 


hemoglobin ++ 


++ -/++++ 


U44975 


COPEB 


oncogene; transcription ++ 
factor 


4+ -/++++ 


X55733 


EIF4B 


translation initiation ++ 


-H- +/++-H- 


L09604 


PLP2 


signaling; colonic epithelium ++ 
differentiation 




HTI6I4 


PPP1CA 


signaling; phosphatase ++ 


++ +++/ 
-HH-+ 

-H- +++/ 
+4++ 


L26247 


suir 


translation initiation; ++ 
probable 



Accession = GeneBank accession number or TIGR database. Symbol = HUGO 
approved gene symbol; unapproved symbol marked by a . BPC, bone marrow plasma 
cells; AC, absolute call; AD, average difference. Quantitative gene expression: - AC 
5 absent; +, AC present and AD < 1,000; ++, AD = 1,000 to 5,000; +++, AD = 5,000 to 
10,000; ++-H-, AD > 10,000. 



FYAMP|,F,17 

Hierachical Cluster Analysis with EDG-MM. LDG-MML and LDG-MM2 Reveals 

10 Developmental Stage-Based Classification of Multiple Mveloma 

To identify whether variability in gene expression seen in multiple 
myeloma (MM) might be used to discern subgroups of disease, hierarchical cluster 
analysis was performed on 74 newly diagnosed MM, 7 tonsil B cell, 7 tonsil plasma cell, 
and 7 bone marrow samples using the EDG-MM (Figure 12), LDG-MM1 (Figure 13), 

15 and LDG-MM2 (Figure 14). Hierarchical clustering was applied to all samples using 30 
of the 50 EDG-MM. A total of 20 genes were filtered out with Max-Min < 2.5. This 
filtering was performed on this group because many of the top 50 EDG-MM showed no 
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variability across MM and thus could not be used to distinguish MM subgroups. A 
similar clustering strategy was employed to clus ter the sa mples using the 50 LDG-MM1 
and 50 LDG-MM2. 

The MM samples clustering with the tonsil B cell samples were then 
5 identified to determine whether the MM cases clustering with tonsil B cells, or tonsil and 
bone marrow plasma cells could be. correlated, with gene expression-defined M M 
subgroups (Table 16). This data showed that of the MM cases clustering tightly with 
the tonsil B cell samples, 13 of 22 were from the MM4 subgroup, accounting for a 
majority of all MM4 cases (13 of 18 MM4 samples). The LDG-MM defined cluster 

1 0 distribution of gene expression-defined MM subgroups was dramatically different in that 
14 of the 28 MM samples clustering with the tonsil plasma cell samples were from 
MM3 subgroup (14 of 15 MM3 samples). LDG-MM2 again showed a strong 
correlation with the MM subgroups in that 14 of the 20 MM cases in this cluster were 
from the MM2 subgroup (14 of 21 MM2 cases). Thus, the MM4, MM3, and MM2 

1 5 subtypes of MM have similarities to tonsil B cells, tonsil plasma cells, and bone marrow 
plasma cells respectively. MM1 represented the only subgroup with no strong 
correlations with normal cell counterparts tested here, suggesting that this class has 
unique characteristics yet to be uncovered. 

The distribution of the four MM subgroups in the normal cell cluster 

20 groups was determined next (Table 17). The results demonstrate that whereas all MM3 
cases were able to be classified, 6 MM1, 5 MM2, and 3 MM4 cases were not clustered 
with any normal cell group in any of the three cluster analyses. In all samples capable of 
being clustered, there were strong correlations between gene expression-defined 
subgroups and normal cell types with the exception of MM1. The data also show that 3 

25 MM1, 2 MM2, 4 MM3, and 1 MM4 cases were found to cluster in two groups. No 
samples were found in three groups and all cases clustering with two normal classes were 
always in an adjacent, temporally appropriate groups. P241 was an exception in that it 
was clustered in the bone marrow plasma cell and tonsil B cell groups. 

Because one of the EDG-MMs was discovered to be cyclin Bl (CCNB1) 

30 (Table 13), it was determined if a panel of proliferation association genes recently 
discovered to be up-regulated in MM4 could be used to advance and validate the 
classification of MM4 as a so-called tonsil B cell-like form bf MM." Box plots of the 
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expression patterns of CCNB1, CKS1, CKS2, SNRPC, EZH2, KNSL1, PRKDC, and 
PRIM1 showed jsignificant difference s across all the groups tested with strong jsignificant 
correlation between tonsil B cells and MM4 (Figure 15). Several important observations 
were made in this analysis. For all the genes, with the exception of SNRPC, there was a 
5 progressive reduction in expression in the transition from tonsil B cells to tonsil plasma 
cells to bone marrow plasma cells. In addition, striking correlations were observed with 
PRIM1 (Figure 15). Although PRIM1 expression was significantly different across the 
entire group (P - 4.25 x KT 5 ), no difference exists between tonsil B cells and MM4 
(Wilcoxon rank sum [WRS] P=0.l) y or between tonsil plasma cells and MM3 (WRS 

1 0 P=0.6). Given the important function of several transcription factors in driving and/or 
maintaining plasma cell differentiation, it was determined if these factors showed altered 
expression across the groups under study. Although other factors showed no significant 
changes, XBP1 (Figure 15) showed an enormous up-regulation between tonsil B cells and 
tonsil plasma cells as expected. However, the gene showed a reduction in bone marrow 

1 5 plasma cells and a progressive loss across the four MM subgroups with MM4 showing 
the lowest level (P=3.85xlO"* 10 ). 

Based on conventional morphological features, plasma cells have been 
thought to represent a homogeneous end-stage cell type. However, phenotypic analysis 
and gene expression profiling disclosed herein demonstrated that plasma cells isolated 

20 from distinct organs can be recognized as belonging to distinct stages of development. 
Multiple myeloma plasma cells are derived from the bone marrow and are thought to 
represent a transformed counterpart of normal terminally differentiated bone marrow 
plasma celts. However, the dramatic differences in survival, which can range from several 
months to greater than 10 years, suggests that multiple myeloma may represent a 

25 constellation of several subtypes of disease. Conventional laboratory parameters have 
not been particular useful in segregating distinct disease subtypes with sufficient 
robustness that would allow adequate risk stratification. In addition, unlike achievements 
in classifying leukemias and lymphomas based on similar nonrandom recurrent 
chromosomal translocations, the extreme karyotypic heterogeneity of multiple myeloma 

30 has made attempts at understanding the molecular mechanises of the disease and 
classification prediction virtually impossible. 
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In studies presented here, it was identified that many EDGs and LDGs 
exhibit highly variable expression in multiple myeloma, suggesting that multiple myeloma _ 
might be amenable to a developmental stage-based classification. It appears from the 
results of this study that multiple myeloma can in fact be classified based on similarities 
5 in gene expression with cells representing distinct stages of B cell differentiation. This 
developmental based-system in conjunction with the gene expression-based system 
reported above represents a critical affirmation of the validity of the developmental- 
based system. 

Recent studies provide support for the hypothesis that MM3 represents 

10 a tonsil plasma cell-like form of the disease. Microarray profiling with the U95Av2 
GeneChip on 150 newly diagnosed patients (including the 74 described here) along with 
an analysis of chromosome 13 loss has revealed a significant link between reduced RBI 
transcripts with either monosomy or partial deletions of chromosome 13 (unpublished 
data). In these studies, it was observed that a number of multiple myeloma cases with or 

1 5 without chromosome 13 deletion had RBI transcripts at levels comparable to those seen 
in normal tonsil plasma cells. FISH analysis with a bacterial artificial chromosome BAC 
covering RBI demonstrated that these cases did not have interstitial deletions of the RBI 
locus. Given that RBI was found to be a LDG-MM1, it was determined if the low 
levels of RBI may be linked to tonsil plasma cell-like MM, i.e MM3. Of 35 multiple 

20 myeloma cases with RBI AD values of <1 100 (RBI AD value not less than 1 100 in 35 
normal bone marrow plasma cell samples tested), 74% belonged to the MM3 class. In 
contrast, of 38 multiple myeloma cases lacking deletion 13 and having RBI AD values 
greater than 1 1 00, only 21 % bejonged to the_MM3 subtype (unpublished, data). 

Although there is a significant link between the cell development-based 

2 5 classification and gene expression profiling-based classification disclosed herein, there are 
exceptions in that although as expected the majority of the MM4 cases belonged to the 
tonsil B cell-cluster subgroup, 5 MM3, 1 MM2, and 3 MM1 cases were also found in 
this cluster. The recognition that cases within one gene expression-defined subgroup 
could be classified in two normal cell defined clusters suggests these cases may have 

30 intermediate characteristics with distinct clinical outcomes. It is of interest to determine 
if the unsupervised gene expression-based system or developmental stage-based system 
alone or in combination will allow the creation of robust risk stratification system. This 
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can be tested by allowing sufficient follow-up time on >150 uniformly treated multiple 

myelomacases in which profiling has been performed at diagnosis. 

MM1 was the only gene expression-define.d subgroup lacking strong 
similarities to any of the normal cell types analyzed in this study. It is possible that 
5 MM1 has similarities to either mucosal-derived plasma cells or peripheral blood plasma 
cells which has recently been shown to represent a distinct type of plasma cells. Future 
studies will be aimed at providing a developmental stage position for this subtype. 

The hypoproliferative nature of multiple myeloma, with labeling indexes 
in the clonal plasma cells rarely exceeding 1%, has lead to the hypothesis that multiple 

1 0 myeloma is a tumor arising from a transformed and proliferative precursor cell that 
differentiates to terminally differentiated plasma cells. It has been shown that there is a 
bone marrow B cell population transcribing multiple myeloma plasma cell-derived VDJ 
joined to IgM sequence in IgG- and IgA-secreting multiple myelomas. Other 
investigations have shown that the clonogenic cell in multiple myeloma originates from a 

15 pre-switched but somatically mutated B cell that lacks intraclonal variation. This 
hypothesis is supported by recent use of single-cell and in situ reverse transcriptase- 
polymerase chain reaction to detect a high frequency of circulating B cells that share 
clonotypic Ig heavy-chain VDJ rearrangements with multiple myeloma plasma cells. 
Studies have also implicated these precursor cells in mediating spread of disease and 

2 0 affecting patient survival. 

Links of gene expression patterns between subsets of multiple myeloma 
and cells representing different late stages of B cell differentiation further support the 
above hypothesis in that MM4 and MM3 may have origins in a so called "multiple 
myeloma stem cell". This hypothesis can be tested by isolating B cells from tonsils or 

25 lymph nodes or peripheral blood of MM3 and MM4 patients, differentiating them into 
plasma cells in vitro using a new method described by Tarte et al. (2002) and then testing 
for the presence of an multiple myeloma gene expression signature within the 
differentiated populations. Even if the multiple myeloma stem cell represents a minority 
population in the B cells, the multiple myeloma gene expression' signature may be 

30 recognized, if not with conventional microarray, then by more sensitive quantitative real- 
time RT-PCR. A real time RT-PCR method is envisioned as expression profile models 
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using at little as 20 genes that distinguish malignant multiple myeloma plasma cells from 

normal plasma cells at an accuracy of 99.5% have beer^^kped (unpublished data). 

Regardless of the outcome of these experiments, it is clear that gene 

expression profiling has become an extremely powerful tool in evaluating the molecular 
5 mechanisms of plasma cell differentiation and how these events relate to multiple 

myeloma development and progression, which in turn should provide more rational 

means of treating this currently fatal disease. 



10 TABLE 16 

Distribution of Multiple Mveloma Subgroups in Hierarchical Clusters Defined bv EDG- 
MM LDG-MMK and LDG-MM2 Genes 



Gene Expression-Defined MM Subgroups 



Norma! Cell- 
Defined Cluster 



MM1 MM2 MM3 MM4 

(n = 20) (n = 21) (n=15) (n=18) 



EDG-MM 
(n = 22) 
LDG-MM1 
(n = 29) 
LDG-MM2 
(n = 20) 




.00005 



.000008 



.000001 



83 



WO 03/053215 



PCT/US02/35724 



TABLE 17 



Distribution of Gene Expression-Defined Multiple Myeloma Subgroup Cases in Normal 
Cell Clusters defined bv EDG-MM. LDG-MM1. and LDG-MM2 
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MM1, MM2, MM3, MM4, and PXXX represent gene expression-defined subgroups 
and patient identifiers, respectively. Y indicates that the case was found in the normal 
cell-defined cluster. Cases in italics were not found to cluster with any normal cell type. 
5 Some cases were found to cluster with two normal cell types. TBC, tonsil B cells; TPB, 
tonsil plasma cells; BPC, bone marrow plasma cells. 



EXAMPLE 18 

Diagnostic Models That BistmguidrMuitipte Myeloma. Monoclonal Gammopathv of 

10 Undetermined Significance, And Normal Plasma Cells 

The molecular mechanisms of the related plasma cell dyscrasias 
monoclonal gammopathy of undetermined significance (MGUS) and multiple myeloma 
(MM) are poorly understood. Additionally, the ability to differentiate these two 
disorders can be difficult. This has important clinical implications because monoclonal 

1 5 gammopathy of undetermined significance is a benign plasma cell hyperplasia whereas 
MM is a uniformly fatal malignancy. Monoclonal gammopathies are characterized by 
the detection of a monoclonal immunoglobulin in the serum or urine and underlying 
proliferation of a plasma cell/B lymphoid clone. Patients with monoclonal gammopathy 
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of undetermined significance have the least advanced disease and are characterized by a 
_ detectable plasma cell population in the marrow (< 10%) and secretion _of a monoclonal 
protein detectable in the serum (<30g/L), but they lack clinical features of overt 
malignancy (such as lytic bone lesions, anemia, or hypercalcemia). Patients with overt 
5 MM have increased marrow plasmacytosis (>10%), serum M protein (>30g/L), and 
generally present with anemia, lytic bone disease, hypercalcemia, or renal insufficiency. 

Approximately 2% of all monoclonal gammopathy of undetermined 
significance cases will convert to overt multiple myeloma per year, but it is virtually 
impossible to predict which of these cases will convert. A difficulty in the clinical 
1 0 management of multiple myeloma is the extreme heterogeneity in survival, which can 
range from as little as two months to greater than eight years with only 20% of this 
variability being accounted for with current clinical laboratory tests. Thus, there is a 
great need for more robust methods of classification and stratification of these diseases. 

This example reports on the application of a panel of statistical and data 
1 5 mining methodologies to classify multiple myeloma (MM), monoclonal , gammopathy of 
undetermined significance and normal plasma cells. Expressions of 12,000 genes in 
highly purified plasma cells were analyzed on a high density oligonucleotide microarray. 
Various methodologies applied to global gene expression data identified a class of genes 
whose altered expression is capable of discriminating normal and malignant plasma cells 
20 as well as classifying some monoclonal gammopathy of undetermined significance as 
"like" MM and others as "unlike" MM. The extremely high predictive power of this 
small subset of genes, whose products are involved in a variety of cellular processes, e.g., 
adhesion and signaling, suggests that their deregulated expression may not only prove 
useful in the creation of molecular diagnostics, but may also provide important insight 
25 into the mechanisms of MM development and/or conversion from the benign condition 
of monoclonal gammopathy of undetermined significance to the overly malignant and 
uniformly fatal MM. 

Six different methodologies were employed herein: logistic regression, 
decision trees, support vector machines (SVM), Ensemble of Voters with 20 best 
3 0 information gain genes (EOV), naive Bayes, and Bayesian networks. All six models were 
run on microarray data derived from Affymetrix (version 5) high density oligonucleotide 
microarray analysis. One hundred fifty six untreated MM samples, 34 healthy samples, 
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and 32 samples designated as monoclonal gammopathy of undetermined significance were 
compared. JThe normalization alg orithm av ailable from the Afiymetrix softwar e wa s 
used. Information on normalization and standardization of the microarray data is 
available on Affymetrix's website. 

5 

Statistical And Data Mining Methodologies 

Various methods were employed with two goals in mind. The first goal is 
to identify genes whose over or under expression are apparent in the comparison of 
healthy samples, monoclonal gammopathy of undetermined significance samples, and 

1 0 malignant MM (multiple myeloma) samples. The second goal is to identify optimal 
methods for use in analyzing microarray data and specifically methods applicable to 
analyzing microarray data on samples from MGUS and MM patients. This is the first 
work that has been done on simultaneously identifying discriminatory genes and creating 
models to predict and describe the differences between myeloma, monoclonal 

1 5 gammopathy of undetermined significance, and healthy samples. 

For each of the methods (and each of the comparisons), a 10-fold cross 
validation was employed to estimate the prediction error. Using 10-fold cross validation, 
1/10* of the data was removed (the 'test'- data),, and the entire model, was created .using 
only the remaining 90% of the data (the 'training' data.) The test data were then run 

20 through the training model and any misclassifications were noted. Error rates were 
computed by compiling the misclassifications from each of the 10 independent runs. 
Empirical results suggest that 10-fold cross validation may provide better accuracy 
estimates than the more common leave one out cross validation (Kohavi, 1995). 

25 Logistic Regression 

The logistic procedure creates a linear model that yields a number between 
zero and one. This value represents a predictive probability, for example, of being in the 
multiple myeloma sample (predictive value close to one) or of being in the normal sample 
(predictive value close to zero). The structure allows for knowledge of the uncertainty in 

30 predicting the group membership of future samples. For example, a new sample may be 
classified with a predictive probability of 0.53 and classified as multiple myeloma, albeit 
with less confidence than another sample whose predictive probability is 0.99. 
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Decision Tree s _ 

Decision tree induction algorithms begin by finding the single feature that 
is most correlated with class. For the present discussion, mutual information was used 
5 and the classes were multiple myeloma vs. normal, multiple myeloma vs. monoclonal 
gammopathy of undetermined significance and monoclonal gammopathy of undetermined 
significance vs. normal. For each feature, the algorithm computes the information gain of 
the detection and of the optimal split point for the real-valued measure (signal). 
Information gain is defined as follows: the entropy of a data set is - p log2p - (1-p) 

1 0 log 2 (l-p) where p is the fraction of samples that are of a certain class. A split takes one 
data set and divides it into two data sets: the set of data points for which the feature has 
a value below the split point (or a particular nominal value) and the set of data points for 
which the gene has a value above the split point (or any other nominal value). 

15 Ensembles 

Even with pruning, decision trees can sometimes over fit the data. One 
approach to avoid over fitting is to learn the n best simple decision trees, and let these 
trees vote on each new case to be predicted. The simplest decision tree is a decision 
stump, a decision tree with a single internal node, or decision node. The "Ensemble of 
2 0 Voters" (EOV) approach is an unweighted majority vote of the top 20 decision stumps, 
scored by information gain. 

Naive Bayes 

Naive Bayes is so named because it makes the (often) naive assumption 
25 that all features (e.g. gene expression levels) are conditionally independent of the given 
class value (e.g. MM or normal). In spite of this naive assumption, in practice it often 
works very well. Like logistic regression, naive Bayes returns a probability distribution 
over the class values. The model simply takes the form of Bayes' rule with the naive 
conditional independence assumption. 

30 
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Bavesian Networks 

_ Bayesian networks (Bayes nets) are a very different form of graphical 

model from decision trees. Like decision trees, the nodes in a Bayes net correspond to 
features, or variables. For classification tasks, one node also corresponds to the class 
5 variable. A Bayes net is a directed acyclic graph (DAG) that specifies a joint probability 
distribution over its variables. Arcs between nodes specify dependencies among 
variables, while the absence of arcs can be used to infer conditional independencies. By 
capturing conditional independence where it exists, a Bayes net can provide a much more 
compact representation of the joint distribution than a full joint table or other 

1 0 representation. There is much current research into the development of algorithms to 
construct Bayes net models from data (Friedman et al., 1999; Murphy, 2001; Pe'er et al., 
2001 .) Bayes nets are proven to be outstanding tools for classification. For example, in 
KDD Cup 2001, an international data mining competition with over 100 entries, the 
Bayes net learning algorithm PowerPredictor was the top performer on a data set with 

15 strong similarities to microarray data (Cheng et al., 2000). This is the algorithm 
employed in the present study. 

Support Vector Machines 

Support vector machines (SVMs) (Vapnik, 1998; Cristianini and Shawe- 

20 Taylor, 2000) are another novel data mining approach that has proven to be well suited 
to gene expression microarray data (Brown et al., 1999; Furey et al., 2000.) At its 
simplest level, a support vector machine is an algorithm that attempts to find a linear 
separator between the data points of two classes. Support vector machines seek to 
maximize the margin, or separation between the two classes. Maximizing the margin can 

25 be viewed as an optimization task that can be solvedwith linear programming techniques. 
Support vector machines based on "kernel methods** can efficiently identify separators 
that belong to other functional classes. A commonly used kernel is the Gaussian kernel. 
Nevertheless, for gene expression microarray data, it has been repeatedly demonstrated 
empirically that simple linear SVMs give better performance (Brown et al., 1999; Furey 

30 et al., 2000) than SVMs with other kernels. 
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Results 

As mentioned, each model _was tested_using 1 0-fold cr oss validation to 
obtain error (misclassification) rates. For each of 10 runs of the data, 10% of the sample 
was removed and the prediction model was created- Then, using the created model, the 
5 test sample was predicted into groups and the accuracy was recorded. After completing 
all 10 runs, the accuracy values were accumulated into the following table (Table 1 8). 

TABLE 18 

10 Ten-Fold Cross Validation Results 



% correctly 
classified 


MM 


Normal 


MM 


MGUS 


MGUS 


Normal 


Logistic 


98.72% 


91.18% 


89.1% 


18.8% 


90.63% 


97.06% 


Trees 


97.44% 


94.12% 


87.18% 


37.5% 


90.63% 


94.12% 


SVM 


98.72% 


97.06% 


89.10% 


34.38% 


90.63% 


100% 


Bayes Net 


98.72% 


100% 


93.56% 


34.38% 


• 90:63%' 


• 97.06% 


EOV 


98.08% 


100% 


57.69% 


68.75% 


90.63% 


100% 


Naive Bayes 


98.08% 


100% 


91.67% 


43.75% 


90.63% 


100% 



There does not appear to be one methodology that stands out from the 
rest in terms of predicting group membership. In the difficult classification of multiple 
myeloma (MM) vs. MGUS, Ensemble of Voters classifies the most MGUS correctly 

1 5 (68.75%), but the fewest multiple myeloma correctly (57.69%.) Using naive Bayes 
produces the best classification, though it does not seem to be appreciably better than 
the other methods. All the methods appear to be able to classify multiple myeloma vs. 
Normal quite well and MGUS vs. Normal almost as well. 

To test the difference of accuracy across procedures, a paired t-test was 

2 0 done for the overall correct classification rate for each of the comparisons on each of the 
folds of the procedures. None of the methods were significantly different (p £ 0.05) 
except the EOV when compared to the other methods in the MGUS vs. multiple 
myeloma test. The paired t-tests give p-values between 0.002 and 0.031 (unadjusted for 
multiple comparisons) for the EOV compared with the other 5 models in the MGUS vs. 
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multiple myeloma test. According to this test, the EOV has a significantly lower rate of 
correct classification, though it is the most accurate MGUS classifier as shown above._In 
comparing two groups, this is often the trade off between sensitivity and specificity. 

Models for predicting group membership were identified for each method. 
5 The models classifying multiple myeloma vs. MGUS had more overlapping genes (17 
unique genes) then the models classifying multiple myeloma vs. Normal (12 unique 
genes) or MGUS vs. Normal (10 unique genes.) A possible explanation for this is that 
there are probably numerous genes that distinguish multiple myeloma and normal 
samples because the two groups are quite distinct. However, the genetic similarities 
1 0 between multiple myeloma and MGUS lead to fewer number of genes that are different 
across the two groups. This dearth of distinguishing genes conditions any good model to 
contain some of the same limited number of genes. A more detailed discussion of the 
particular genes is given in the conclusion. 

15 Meta-Voting 

As an additional step to improve the prediction capabilities of the 
method, a "meta" prediction value was calculated. For each of the logistic regression, 
support vector machine, and Bayes Net procedures, the marginal predicted group was 
calculated and then a final prediction was given. as. the. top .voted group. A. sample is 
20 classified in a group if at least two of the three methods predict that group. The 
calculation indicate that the meta voting procedure does not improve the results. 

Receiver Operator Characteristic (ROC) Curves 

A Receiver Operating Characteristic (ROC) curve demonstrates the 

25 relationship between sensitivity (correct prediction to the more diseased group) and 
specificity (correct prediction to the less diseased group). Figure 16 gives the Receiver 
Operating Characteristic curves for the comparison of MM (multiple myeloma) vs. 
MGUS classification. The difficult comparison (multiple myeloma vs. MGUS) is 
challenging for all the methods. For example, naive Bayes has a high sensitivity but at 

30 the cost of low specificity. For even mediocre values of specificity, the sensitivity drops 
off quite rapidly. In order to have a high sensitivity for any of the methods (that is, in 
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order to have very few false positives of multiple myeloma), the ability to predict 
MGUS accurately (sp ecificit y) was compromised^ 

Prediction of MGUS 

5 The models that classify the multiple myeloma and normal samples into 

distinct groups may also be able to be used as a predictive model for samples that are not 
clearly in either group based on clinical data. As a whole, the MGUS samples are 
clinically healthy (except for high levels of immunoglobulins) but genetically appear 
malignant Applying the multiple myeloma vs. normal model to the MGUS samples will 

1 0 give us an idea as to which group the MGUS samples look more like. Table 19 provides 
the prediction distribution for the MGUS samples into the multiple myeloma and normal 
groups based on the model which compared multiple myeloma to normal samples. On 
average, about 90% of the MGUS samples are classified as multiple myeloma, and about 
10% are classified as normal. The possible reason for this is that the 10% who are 

15 classified as normal may have longer survival times and less disease progression. 
Regardless, the similarity of MGUS to multiple myeloma (even in the model that was 
derived without any MGUS) gives additional evidence that the MGUS is actually 
genetically much more similar to the multiple myeloma than to the normal samples. 
From both the prediction of the dichotomous groups and the classification of MGUS 

20 samples into the two extreme groups, it can be concluded that the methods are not 
notably different. 

In order to better understand the mechanisms behind the poor 
classification of the MGUS samples (when compared to multiple myeloma), the number 
of MGUS classified as multiple myeloma for each of three methods, logistic regression, 

25 SVM, and Bayes Net was tabulated. Of the 32 MGUS samples, the misclassification 
rates are given in Table 20. There were 26 MGUS samples misclassified using the 
logistic procedure; 17 of the 26 were also misclassified using SVM, and 18 of the 26 were 
misclassified using Bayes Net. This cross tabulation indicates that the misclassified 
MGUS samples are continuously getting misclassified which lends evidence to a possible 

3 0 subset of MGUS samples that are genetically similar to the multiple myeloma samples. 
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TABLE 19 



MM vs. Normal (predicting MGUS) 


% MGUS classified as* 


1V1 1V1 


INOIlIlol 


Logistic 


87.5% 


12.5% 


Trees 


93.75% 


6.25% 


SVM 


93.75% 


6.25% 


Bayes Net 


93.75% 


6.25% 


EOV 


84.37% 


15.63% 


Naive Bayes 


93.75% 


6.25% 



5 TABLE 20 



# MGUS Logistic 
misclassified 


SVM 


Bayes Net 


Logistic 26 


17 


18 


SVM 


21 


17 


Bayes Net 


i 


21 



Discussion 

1 0 Six different statistical and data mining algorithms were examined for their 

ability to discriminate normal, hyperplastic, and malignant cells based on the expression 
patterns of -12,000 genes.- The models were highly accurate in distinguishing normal 
plasma cells from abnormal cells. However, these models displayed a uniform failure in 
the discrimination between the hyperplasic cells and malignant cells. A major goal of this 

1 5 study was to develop or modify data mining tools in order to capture a small subset of 
genes from massive gene expression data sets to accurately distinguish groups of cells, 
e.g. normal, precancerous, and cancerous cells, with the ultimate goal to create sensitive 
and reproducible molecular-based diagnostic tests. In addition, future studies can be 
aimed at using a similar strategy to identify a minimum subset of genes capable of 
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discriminating subgroups of disease for risk stratification and prognostics. This is a 
particularly important concept for this disease_as the overall survival jnjnultiple 
myeloma is highly variable, with some patients surviving as long as 10 years while others 
die within several months of diagnosis. Current microarrray studies require the isolation 
5 of large numbers of cells that necessitate advanced facilities and expertise. The studies 
described in this example represent the first step toward streamlining this process, as a 
smaller subset of genes (10-20) with a high predictive power allows for a massive 
reduction in scale, which in turn will make development of a commercial test more 
amenable to mass production and hence widespread clinical use. 

10 One possible reason for the inability of the models to discriminate 

monoclonal gammopathy of undetermined significance from multiple myeloma is that 
MGUS represents at least two different diseases. This is supported by the overlap in 
misclassification of MGUS samples as shown in Tables 19-20. In simplistic terms, 
MGUS can be viewed as a disease that will remain indolent or one that will convert to 

1 5 overt malignancy. Accruing sufficient numbers of stable and progressive MGUS cases 
along with sufficient follow-up time will help resolve this issue. 

The failure of the models to differentiate the two disease types could be 
related to the limitations of the current methodologies. The microarray profiling utilized 
here only interrogated 1/3 of the estimated 35,000 human genes (International Human 

20 Genome Sequencing Consortium, 2001; Venter et aL, 2001), thus it is possible that a 
whole genome survey would reveal discriminating features. A new Affymetrix U133 
GeneChip system which is thought to interrogate all human genes may be used to 
address this question. It is also possible that a whole genome analysis will reveal no 
significant differences. This revelation could mean any of a variety of possibilities: (1) 

25 there is no genetic difference between the two diseases, (2) only the MGUS that are 
classified as multiple myeloma are genetically similar to multiple myeloma, and the 
clinical tests are unable to identify that distinction, (3) the current microarray technology 
is not specific enough to measure the differences between the two diseases, (4) the 
methods described above are not appropriate for this, type of analysis. If (1) or (2) is 

3 0 true, these results would point to other determinants of an indolent or malignant course 
such as genetic predisposition or somatic DNA mutations not manifest in gene 
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expression, a unique environmental exposure interacting with these predisposing genetic 
traits, or a non-tumor cell microenvironment or "soil" that prom otes p lasma cell growth. _ 
Another goal of this work was to use the models of global gene expression 
profiling to define critical genetic alterations that accompany the transition of a plasma 
5 cell from its normal homeostasis to a benign hyperplasia and from hyperplasia to an 
overt malignancy. Integration of the data from the six models revealed a group of genes 
that were found in two or more of the models: Tor purposes of this study these genes 
were interpreted to represent the most differentially expressed in these transitions. Ten 
common genes were identified in the normal to MGUS (monoclonal gammopathy of 

1 0 undetermined significance) comparison with 8 of the genes being down-regulated or shut 
down in the abnormal cells. A similar phenomenon was seen in the normal versus 
multiple myeloma comparison with 9 of 12 common genes being down-regulated. This 
was in contrast to the MGUS versus multiple myeloma comparison where almost half (8 
of 18 probe sets representing 17 unique genes) of the probe sets were up-regulated in 

1 5 multiple myeloma. Probes sets for 4 different chemokine genes SCYA23 (Normal vs. 
MGUS), SDF1 (Normal vs. MM), and SCYC2 and SCYA18 (MGUS vs. MM) were 
down-regulated in the latter group in each of the 3 comparisons. Two probe sets for 
SCYA18 were found in the MGUS vs. MM comparison. This is an important validation 
ofSCK4/5gene expression truly being different in the two conditions. Chemokines are 

20 important mediators of immune responses and act as soluble factors that induce the 
migration of specific immune cells to sites of inflammation. The potential significance of 
the loss of expression of multiple chemokine genes in plasma cell dyscrasias is not 
understood, but may point to how tumors may suppress anti-tumor immune reactions. 

As with SCYA18, two unrelated probe sets for the human homologue of 

25 the Drospohila melangaster gene frizzled (FZD2) were down-regulated in the normal to 
MGUS transition. FZD2 codes for a membrane bound receptor that binds a highly 
conserved family of soluble ligands known as WNTs. WNT signaling regulates homeotic 
patterning and cell-fate decisions in multicellular organisms ranging from flies to humans. 
The Wnt signaling cascade has also been shown to be involved in neoplasia as 

30 hyperactivation of the Wnt-1 gene by viral insertional mutagenesis caused spontaneous 
mammary tumorigenesis in mice. It is suspected that loss of FZD2 expression in MGUS 
carries potential significance given that expression profiling has revealed deregulated 
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expression of multiple members of the WNT signaling pathway in multiple myeloma and 
plasma cell leukemia (results shown above; Zhan et al., 2002;J)e_Vos etal, 2001). 
Results in previous examples presented above also show that a secreted antagonist of 
WNT signaling, FRZB, exhibits elevated expression in a comparison of normal plasma 
5 cells and multiple myeloma (Zhan et al., 2002; De Vos et al., 2001). The concomitant, or 
possibly sequential, down-regulation of the functional WNT receptor (FZD2) and up- 
regulation of a decoy receptor strongly suggests that disruption of WNT signaling plays 
a pathological role in multiple myeloma development. In addition to abnormalities in the 
receptor and decoy genes, the genes for the ligands, WNT5A and WNT10B, have been 

10 identified as altered in multiple myeloma, (results , shown, above; Zhan et. al., 2002). 
Whereas WNT5A is upregulated in multiple myeloma, WNT 1 0B is expressed at high 
levels in normal plasma cells but not in a majority of multiple myeloma plasma cells 
(Zhan et. al., 2002). It is of note that recent studies have demonstrated that Wnt-5A, 
Wnt-2B, Wnt-lOB, Wnt-1 1 comprise a novel class of hematopoietic cell regulators. 

1 5 Taken together these findings suggests that deregulated autocrine and/or 

paracrine Wnt signaling may play a pivitol role in plasma cell dyscrasias and that a 
progressive deregulation of multiple components of the signaling complex may be 
associated with disease progression from normal plasma cells to hyperplastic, but benign, 
MGUS then to overt multiple myeloma. In conclusion, it is anticipated that strategies 

20 like those employed here will allow the creation of new molecular diagnostic and 
prognostic tests and should provide useful insight into the genetic mechanisms of 
neoplastic transformation. 

The following references are cited herein: 
25 Brown et al., Support vector machine classification of microarray gene expression data. 

UCSC-CRL 99-09, Department of Computer Science, University California Santa 
Cruz, Santa Cruz, C A ( 1 999). 
Chauhan et al., Oncogene 21:1346-1358 (2002). 
Cheng et al, KDD Cup 2001. SIGKDD Explorations 3:47-64 (2000). 
30 Cristianini and Shawe-Taylor, An Introduction to Support Vector Machines and other 
kernel-based learning methods. Cambridge University Press (2000). 
De Vos et al. Blood 98:771-80 (2001). 
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Friedman et al., Learning Bayesian network structure from massive datasets: the u sparse 
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Intelligence (1999). 
5 Furey et al.,. Bioinformatics 1 6:906-914 (2000). 

International Human Genome Sequencing Consortium, Initial sequencing and analysis of 

the human genome. Nature 409:860-921(2001). 
Kohavi, A study of cross-validation and bootstrap for accuracy estimation and model 
selection. Proceedings of the International Joint Conference on Artificial 
1 0 Intelligence QJCAT) (1995). 

Murphy, The Bayes Net Toolbox for Matlab. Computing Science and Statistics: 

Proceedings of the Interface, (2001). 
Pe'er et al., Inferring Subnetworks, from Perturbed Expression Profiles. Proc of Ninth 
Intnl Conf on Intelligent Systems for Mol Biol (2001). 
. 1 5 Shaughnessy et al. Blood 96: 1505-151 1 (2000). 

Vapnik, Statistical Learning Theory. John Wiley & Sons (1998). 
Venter et al. Science 291:1304-51 (2001). 
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' Any patents, or publications .mentioned in. this specification are. indicative 

20 of the levels of those skilled in the art to which the invention pertains. Further, these 
patents and publications are incorporated by reference herein to the same extent as if 
each individual publication was specifically and individually indicated to be incorporated 
by reference. 

One skilled in the art will appreciate readily that the present invention is 
25 well adapted to carry out the objects and obtain the ends and advantages mentioned, as 
well as those objects, ends and advantages inherent herein. The present examples, along 
with the methods, procedures, treatments, molecules, and specific compounds described 
herein are presently representative of preferred embodiments, are exemplary, and are not 
intended as limitations on the scope of the invention. Changes therein and other uses will 
3 0 occur to those skilled in the art which are encompassed within the spirit of the invention 
as defined by the scope of the claims. 
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1. A method of gene expression-based classification for multiple 
myeloma, comprising the steps of: 
5 isolating plasma cells from individuals with or without multiple myeloma; 

isolating nucleic acid samples from said plasma cells; 
hybridizing said nucleic acid samples to a DNA microarray; and 
performing hierarchical clustering analysis on data obtained from said 
hybridization, wherein said clustering analysis will classify said individuals with or 
1 0 without multiple myeloma into distinct subgroups. 



2. The method of claim 1, wherein said subgroups of multiple 
myeloma are MM1, MM2, MM3 and MM4. 



15 



3. A method of identifying genes with elevated expression in subsets 
of multiple myeloma patients, comprising the steps of: 

isolating plasma cells from individuals with multiple myeloma; 
2 0 isolating nucleic acid samples from said plasma cells; 

hybridizing said nucleic acid samples to a DNA microarray; and 

performing hierarchical clustering analysis on data obtained from said 
hybridization, wherein said clustering analysis will identify genes with elevated 
expression in subsets of multiple myeloma patients. 

25 



4. The method of claim 3, wherein said genes have accession number 
selected from the group consisting of M64347, U89922, X67325, X59798, U62800, 
U35340, X12530, X59766, U58096, U52513, X76223, X92689, D17427, LU329, 
30 L13210, U10991, L10373, U60873, M65292, HT4215, D13168, AC002077, M92934, 
X82494, and M30703. 
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5. A method of identifying potential therapeutic targets for multiple 
myeloma, comprising the steps of: 

isolating plasma cells from individuals with or without multiple myeloma; 

isolating nucleic acid samples from said plasma cells; 
5 hybridizing said nucleic acid samples to a DNA microarray; 

performing hierarchical clustering analysis on data obtained from said 
hybridization; and 

identifying genes with significantly different levels of expression in 
multiple myeloma patients as compared to normal individuals, wherein said genes are 
1 0 potential therapeutic targets for multiple myeloma. 



6. The method of claim 5, wherein said potential therapeutic targets 
for multiple myeloma are genes that have accession number selected from the group 

15 consisting of L36033, M63928, U64998, M20902, M26602, M21119, M14636, 
M26311, M54992, X16832, M12529, M15395, Z74616, HT2152, U97105, U81787, 
HT3165, M83667, L33930, D83657, M11313, M31158, U24577, M16279, HT2811, 
M26167, U44111, X59871, X67235, U19713, Y08136, M97676, M64590, M20203, 
M30257, M93221, S75256, U97188, Z23091, M34344, M25897, M31994, Z31690, 

20 S80267, U00921, U09579, U78525, HT5158, X57129, M55210, L77886, U73167, 
X16416, U57316, Y09022, M25077, AC002115, Y07707, L22005, X66899, D50912, 
HT4824, U10324, AD000684, U68723, X16323, U24183, D13645, S85655, X73478, 
L77701, U20657, M59916, D16688, X90392, U07424, X54199, L06175, M55267, 
M87507, M90356, U35637, L06845, U81001, U76189, U53225, X04366, U77456, 

25 L42379, U09578, Z80780, HT4899, M74088, X57985, X79882, X77383, M91592, 
X63692, M60752, M96684, U16660, M86737, U35113, X81788, HT2217," M62324, 
U09367, X89985, L19871, X69398, X05323, X04741, D87683, D17525, M64347, 
U89922, X67325, X59798, U62800, U35340, X12530, X59766, U58096, U52513, 
X76223, X92689, D17427, LI 1329, L13210, U10991, L10373, U60873, M65292, 

30 HT4215, D13168, AC002077, M92934, X82494, and M30703. 
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7. A method of identifying a group of genes that can distinguish 
^between normal plasma cells and plasmacells of multiple myeloma, comprising the ste ps 
of: 

isolating plasma cells from individuals with or without multiple myeloma; 
5 isolating nucleic acid samples from said plasma cells; 

hybridizing said nucleic acid samples to a DNA microarray; 
identifying differential gene expression patterns that are statistically 
significant; and 

applying linear regression analysis to identify a group of genes, wherein 
1 0 said group of genes is capable of accurate discrimination between normal plasma cells and 
plasma cells of multiple myeloma. 



8. The method of claim 7, wherein said genes have accession number 
15 HT5158, L33930, L42379, L77886, M14636, M26167, U10324, U24577, U35113, 

X16416, X64072, X79882, Z22970, and Z8O780L. 

9. A method of identifying a group of genes that can distinguish 
2 0 between subgroups of multiple myeloma, comprising the steps of: 

isolating plasma cells from individuals with multiple myeloma; 
isolating nucleic acid samples from said plasma cells; 
hybridizing said nucleic acid samples to a DNA microarray; 
identifying differential gene expression patterns that are statistically 

25 significant; and 

applying linear regression analysis to identify a group of genes, wherein 
said group of genes is capable of accurate discrimination between subgroups of multiple 
myeloma. 

30 

10. The method of claim 9, wherein said genes have accession number 
X54199, M20902, X89985, M31158, U44111, X16416, HT2811, D16688, U57316, 
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U77456, D13645, M64590, L77701, U20657, L06175, M26311, X04366, AC002115, 
X06182, M16279, M97676, U10324, S85655, and X63692. 



5 1 1 . A method of diagnosis for multiple myeloma, comprising the steps 

of: 

isolating plasma cells from an individual; . 

examining expression of a group of 14 genes within said plasma cells, said 
14 genes have accession numbers HT5158, L33930, L42379, L77886, M14636, 
1 0 M26167, U10324, U24577, U351 13, X16416, X64072, X79882, Z22970, and Z80780; 
and 

performing statistical analysis on the expression levels of said genes, 
wherein a statistically significant value of said analysis indicates that said individual has 
multiple myeloma. 

15 

1 2. The method of claim 1 1 , wherein the expression of said 14 genes is 
examined at the nucleic acid level or protein level. 

20 

13. A method of diagnosis for subgroups of multiple myeloma, 
comprising the steps of: 

isolating plasma cells from an individual; 

examining expression of a group of 24 genes within said plasma cells, said 
25 24 genes have accession numbers X54199, M20902, X89985, M31158, U44111, 
X16416, HT2811, D16688, U57316, U77456, D13645, M64590, L77701, U20657, 
L06175, M26311, X04366, AC002115, X06182, M16279, M97676, U10324, S85655, 
andX63692;and 

performing statistical analysis on the expression levels of said genes, 
3 0 wherein a statistically significant value of said analysis provides diagnosis for subgroups 
of multiple myeloma. 
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14. The method of claim 13, wherein the expression of said 24 genes is 
examined at the nucleic acid level or protein level. 



5 1 5. A method of treatment for multiple myeloma, comprising the step 

of: 

inhibiting expression of a gene that has accession number selected from the 
group consisting of U09579, U78525, HT5158, X57129, M55210, L77886, U73167, 
X16416, U57316, Y09022, M25077, AC002115, Y07707, L22005, X66899, D50912, 

10 HT4824, U10324, AD000684, U68723, X16323, U24183, D13645, S85655, X73478, 
L77701, U20657, M59916, D16688, X90392, U07424, X54199, L06175, M55267, 
M87507, M90356, U35637, L06845, U81001, U76189, U53225, X04366, U77456, 
L42379, U09578, Z80780, HT4899, M74088, X57985, X79882, X77383, M91592, 
X63692, M60752, M96684, U16660, M86737, U35113, X81788, HT2217, M62324, 

15 U09367, X89985, L19871, X69398, X05323, X04741, D87683, D17525, M64347, 
U89922, X67325, X59798, U62800, U35340, X12530, X59766, U58096, U52513, 
X76223, X92689, D17427, LI 1329, L13210, U10991, L10373, U60873, M65292, 
HT4215, D13168, AC002077, M92934, X82494, and M30703. 

20 

16. A method of treatment for multiple myeloma, comprising the step 

of: 

increasing expression of a gene that has accession number selected from 
the group consisting of L36033, M63928, U64998, M20902, M26602, M21119, 
25 M14636, M26311, M54992, X16832, M12529, M15395, Z74616, HT2152, U97105, 
U81787, HT3165, M83667, L33930, D83657, Ml 1313, M31158, U24577, M16279, 
HT2811, M26167, U44111, X59871, X67235, U19713, Y08136, M97676, M64590, 
M20203, M30257, M93221, S75256, U97188, Z23091, M34344, M25897, M31994, 
Z31690, S80267.U00921. 

30 
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17. A method of developmental stage-based classification for multiple 
myeloma, comprising the steps of: 

(a) isolating plasma cells and B cells from normal individuals; 

(b) isolating nucleic acid samples from said plasma cells and B cells; 
5 (c) hybridizing said nucleic acid samples to a DNA microarray; 

(d) performing hierarchical clustering analysis on data obtained from said 
hybridization, wherein said clustering analysis will identify genes that classify said 
plasma cells and B cells according to their developmental stages; 

(e) isolating multiple myeloma plasma cells from individuals with multiple 

10 myeloma; 

(f) isolating nucleic acid samples from said multiple myeloma plasma cells; 

(g) hybridizing nucleic acid samples of (f) to a DNA microarray; 

(h) performing hierarchical clustering analysis on data obtained from (d) 
and (g), wherein said clustering analysis will classify said multiple myeloma plasma cells 

1 5 according to the developmental stages of normal B and plasma cells. 

18. The method of claim 17, wherein said plasma cells are isolated 
from an organ selected from the group consisting of tonsil, bone marrow, mucoal tissue, 

2 0 lymph node and peripheral blood. 

19. The method of claim 17, wherein said B cells are isolated from an 
organ selected from the group consisting of tonsil, bone marrow, lymph node and 

25 peripheral blood. 

20. A method of discriminating normal, hyperplastic and malignant 
plasma cells, comprising the steps of: 

3 0 obtaining gene expression data by DNA microarray; and 

performing statistical analysis on said data by a method selected from the 
group consisting of logistic regression, decision trees, ensembles, na'fve bayes, bayesian 
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networks and support vector machines, wherein said analysis discriminates among 
normal, hyperplastic and malignant plasma cells. 



5 21. A method of identifying a gene with altered expression between 

normal and malignant plasma cells, comprising the steps of: 

obtaining gene expression data by DNA microarray ; and 
performing statistical analysis on said data by a method selected from the 
group consisting of logistic regression, decision trees, ensembles, naive bayes, bayesian 
1 0 networks and support vector machines, wherein said analysis would identify a gene with 
altered expression between normal and malignant plasma cells. 
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SEQUENCE LISTING- 
<110> Shaughnessy, John 

Zhan, -Fenghuang < 

Barlogie, Bart 

<120> Diagnosis, Prognosis and Identification of 
Potential Therapeutic Targets of Multiple 
Myeloma Based on Gene Expression Profiling 

<130> D6432PCT 

<141> 2002-11-07 

<150> US 60/348,238 

US 60/355,386 
US 60/403,075 
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